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This  report  reviews  the  human  factors  issues  associated  with 
the  use  of  voice  technology  in  the  cockpit  and  areas  for  future 
research  are  summarized.  The  current  formulation  of  the  LHX 

avionics  suite  is  described  and  the  allocation  of  tasks  to  voice 
in  the  cockpit  is  discussed.  State-of-the-art  speech 
recognition  technology  is  reviewed .  Finally,  a  questionnaire 
designed  to  tap  pilot  opinions  concerning  the  allocation  of  tasks 
to  voice  input  and  output  in  the  cockpit  is  presented.  This 
questionnaire  was  designed  to  be  administered  to  operational  AH-1 
pilots.  Half  of  the  questionnaire  deals  specifically  with  the 
AH-1  cockpit  and  the  types  of  tasks  pilots  would  like  to  have 
performed  by  voice  in  this  existing  rotorcraft.  The  remaining 
portion  of  the  questionnaire  deals  with  an  undefined  rotorcraft 
of  the  future  and  is  aimed  at  determining  what  types  of  tasks 
these  pilots  would  like 
anything  was  possible 
constraints .  Kp  U  u)  0 1 


to  have  performed  by  voice  technology  if 
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factors  in  the  successful  development  of  these  aircraft. 

In  deference  to  the  criticality  of  this  issue  research  is 
being  devoted  to  the  design  and  optimization  of  the 
pilot/aircraft  interface  in  the  LHX  series  of  aircraft.  Many 
functions  will  be  automated  based  on  data  fusion  techniques  and 
the  use  of  artificial  intelligence.  Moreover,  based  on  the 
assumption  that  the  pilot's  visual  input/manual  output  channels 
are  already  overburdened,  voice  interaction  with  avionic  systems 
will  be  implemented.  Voice  command  via  automatic  speech 
recognition  will  provide  the  means  for  systems  control  and 
interaction  without  necessitating  the  use  of  the  pilot's  manual 
control  resources.  Similarly,  the  use  of  speech  generation  as  a 
means  of  information  display  and  feedback  will  reduce  the  visual 
processing  load. 

Speech  technology,  both  recognition  and  generation,  has 
advanced  at  an  extremely  rapid  rate  in  the  last  decade  and  is 
becoming  increasingly  desirable  as  a  medium  of  interaction 
between  humans  and  computers  since  it  is  a  natural  and  efficient 
mode  of  communication  that  also  frees  the  hands  and  eyes  for 
other  tasks.  The  benefits  associated  with  speach  technology 
particularly  suggest  its  use  in  the  helicopter  cockpit  where 
visual  and  manual  channel  loadings  are  so  high.  Optimal  use  of 
this  technology,  however,  is  dependent  upon  whether  it  is 
allocated  to  those  human  tasks  that  are  fatiguing,  difficult,  and 
distracting.  In  essence,  the  primary  consideration  governing  the 
integration  of  speech  in  the  cockpit  must  be  human  capabilities 
and  needs.  Since  speech  technology  offers  a  new  dimension  in 
human/computer  interaction,  there  is  a  temptation  to  use  it  as  a 
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INTRODUCTION 


Advances  in  technology,  particularly  microprocessor 
technology,  continue  to  or  laden  tne  scope  of  military  aircraft 
missicns.  Coincident  with  increased  mission  complexity  and 
aircraft  porformar.ee  capabilities  are  increased  demands  upon  the 
pilot  who  is  required  to  monitor,  manage,  and  interact  with  these 
systems.  The  computer-  driven  multifunction  display  and  keyboard 
is  the  primary  medium  of  interaction  between  the  pilot  and 
various  cn  board  systems  in  emerging  cockpit  configurations.  The 
multifunction  display  can  supply  vast  amounts  of  information  in 
a  relatively  small  amount  of  space.  However,  the  multifunction 
keyboard  when  it  is  used  alone  as  a  means  of  interacting  with  a 
multifunction  display  places  a  heavy  buraen  on  the  pilot's  visual 


and  manual  resources.  Furthermore,  no 

genera  1 

guidelines 

have 

been  developed 

for  information 

display 

formats 

that  help 

the 

pilot  process 

this  information 

quickly 

and  ef 

f iciently . 

New 

control/display 

configurations 

are  nee 

ded  to 

fully  tap 

the 

expanded  information  retrieval  capabilities 

prof  f erred 

by 

emerging  microprocessor-based  avionics. 

The  Army's  new  light  helicopter  program  ( LHX )  planned  for 
operational  use  in  the  mid  1990 's  will  use  highly  capable 
digital  avionics,  which  will  provide  greatly  improved  performance 
and  mission  capabilities  relative  to  existing  Army  helicopters. 
In  addition  the  crewsize  may  be  reduced  to  one.  The  complexity 
of  this  aircraft  in  terms  of  mission  and  system  requirements 
coupled  with  the  one  crewmember  could  be  the  limiting 
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mere  replacement  for  visual/manual  operations,  such  as  switching 
functions.  Although  speech  technology  can  replace  a  switch 
closure,  one-to-one  replacements  of  visual  and  manual  operations 
may  not  fully  exploit  the  speech  interface. 

This  report  will  first  review  the  human  factors  issues 
associated  with  the  use  of  voice  technology  in  the  cockpit  and 
areas  for  future  research  will  then  be  summarized.  The  current 
formulation  of  the  LHX  avionics  suite  will  be  described  and  the 
allocation  of  tasks  to  voice  in  the  cockpit  will  be  discussed. 
State-of-the-art  speech  recognition  technology  will  be  reviewed. 
Finally,  a  questionnaire  designed  to  tap  pilot  opinions 
concerning  the  allocation  of  tasks  to  voice  input  and  output  in 
the  cockpit  will  be  presented  in  the  appendix.  This 
questionnaire  was  designed  to  be  administered  to  operational  AH-1 
pilots.  Half  of  the  questionnaire  deals  specifically  with  the 
AH-1  cockpit  and  the  types  of  tasks  pilots  would  like  to  have 
performed  by  voice  in  this  existing  rotorcraft.  The  remaining 
portion  of  the  questionnaire  deals  with  an  undefined  rotorcraft 
of  the  future  and  is  aimed  at  determining  what  types  of  tasks 
these  pilots  would  like  to  have  performed  by  voice  technology  if 
anything  was  possible,  ie.  if  there  were  no  technological 
constraints . 
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AUTOMATIC  SPEECH  RECOGNITION 


Although  the  technology  is  advancing  rapidly,  ate-of-the- 
art  speech  recognition  is  still  in  its  infancy  n  ...iy  respects. 
Numerous  constraints  are  placed  on  the  user  in  terms  of  the 
number  of  words  that  may  be  recognized  at  a  time,  the  speed  with 
which  words  may  be  spoken  in  succession,  the  permissible 
variability  in  the  pronunciation  of  each  word,  and  the  amount  of 
preparation  time  needed  to  use  an  automatic  speech  recognition 
(ASR)  device  in  an  operational  environment.  However,  continuing 
technological  advances  suggest  that  by  the  time  we  determine  how 
best  to  interface  ASR  and  the  human,  these  constraints  may  no 
longer  be  of  concern. 

Before  continuing  with  a  discussion  of  the  more  complex 
issues  associated  with  the  use  of  ASR  in  the  cockpit,  a  brief 
functional  description  of  this  technology  is  warranted  as  is  th'~> 
definition  of  some  of  the  phraseology. 

SPEAKER  DEPENDENT  VS .  INDEPENDENT  RECOGNITION 

Computer  recognition  of  speech  can  be  classified  as  either 
speaker  dependent  or  speaker  independent  with  the  former  being 
easier  to  accomplish  than  the  later.  Speaker  independent  means 
that  the  device  will  recognize  words  spoken  by  many  different 
speakers,  based  on  only  one  set  of  templates.  This  type  of 
speech  recognition  is  more  difficult  to  accomplish  than  speaker 
dependent  recognition  since  human  speech  patterns,  like 
fingerprints,  are  unique  to  each  individual. 


The  trick  to 


accomplishing  independent  speech  recognition  is  to  distill  the 
salient  features  for  each  word  that  are  ccrrjr.cn  tc  every 
individual's  utterance  ci  that  word.  These  "universal"  features 
then  comprise  tke  reference  template  foi  that  particular  word. 
It  is  readily  apparent  that  reference  templates  formed  and  used 
by  only  one  speaker  in  a  speaker  dependent  system  will  be  much 
richer  in  linguistic  content  {hence  yielding  better  accuracy) 
than  those  templates  created  for  use  by  many  speakers. 

Due  to  state-of-the-art  limitations  in  the  creation  of 
independent  speech  recognition  reference  templates,  these  devices 
are  primarily  limited  to  recognition  of  the  digits  zero  through 
nine  and  are  further  constrained  by  user  dialects.  For  example, 
an  independent  speech  recognition  device  which  uses  templates 
formed  from  typically  "southern"  speech  will  not  recognize  those 
same  words  as  accurately  when  spoken  with  a  "northern"  accent. 

A  speaker  dependent  system  requires  that  each  user  form  one 
set  of  templates  for  each  word  in  the  working  vocabulary.  During 
the  training  phase  the  user  repeats  each  word  ir.  the  specified 
vocab”1 *rv  from  ^ne  to  ter  times.  The  exact  number  of 
repetitions  is  dependent  both  upon  the  particular  device  in  use 
and  upon  the  complexity  of  the  vocabulary.  The  templates  are 
then  maintained  in  the  system  memory  so  that  during  operational 
use  of  che  machine  each  incoming  utterance  is  compared  to  these 
reference  templates.  The  template  that  matches  most  closely  is 
then  chosen  as  the  spoken  utterance. 

Two  distinct  approaches  to  the  creation  of  these  reference 
templates  have  been  adopted.  One  method  averages  the 
repetitions  of  each  word  in  the  vocabulary.  Typically,  this 
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training  method  requires  three  or  mere  repetitions  cf  each  word 
in  the  vocabulary.  The  resulting  templates  then  are  an 
"averaged"  representation  of  each  word  that  account  for  slight 
variations  in  the  pronunciation  of  these  words.  With  respect  to 
the  number  of  repetitions  needed  to  create  optima 1  reference 
templates  using  the  averaging  technique,  more  is  not  always 
better.  There  is  a  point  at  which  additional  repetitions  cause 
the  templates  to  lose  their  clarity.  Generally,  the  manufacturer 
will  recommend  the  appropriate  number  of  repetitions.  A  balance 
must  be  achieved  between  too  few  repetitions  (which  yields 
incomplete  templates)  and  too  many  repetitions. 

The  ether  way  in  which  reference  templates  are  created 
typically  requires  only  one  or  two  repetitions  of  each  vocabulary 
word.  These  templates  are  maintained  separately  in  memory  for 
comparison . 

Poock  (1982)  has  shewn  that  a  particular  speaker  dependent 
system  can  achieve  a  limited  degree  of  speaker  independence  by 
having  several  speakers  repeat  the  vocabulary  during  one  training 
session.  Because  the  device  uses  the  averaging  technique  it 
pioduces  a  set  of  reference  templates  with  speech  characteristics 
representative  of  each  speaker.  Thus,  several  speakers  can  use 
the  device  concurrently  with'  ..t  havinq  to  load  separate  templates 
for  each  individual. 

For  the  most  part,  however,  optimal  performance  in  terms 
of  recognition  accuracy  will  be  obtained  when  recognition  ir 
accomplished  by  one  user  at  a  time,  based  on  his  01  her  own  sec 
of  reference  templates. 
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SCRZTE  Tg  .  CCNN'ZCTZD.-  CC.."'.':NT: CL’S  WORD  yrccc.viTic.-; 

The  next  issue  cf  importance  with  respect  to  ASR  is  that  c i 
Jiscr  -to  vs.  connected  cr  continuous  word  recognition 
cap  biiit.es.  A  discrete  word  recognition  device,  which  13  the 
test  corr~.cn  type  currently  available,  will  recognize  single¬ 
utterances  cr  short  phrases  (typically  up  to  1.5  s  without  pause) 
m  isolation.  The  user  must  pause  for  a  predefined  length  cf 
ri.ro  ( approximately  2C0  ms)  between  each  utterance.  This  pause 
requirement  facilitates  the  endpoint  detection  of  eacl  utterance. 

Connected  word  recognition  allows  the  user  to  input  a  short 
string  cf  words  in  a  connected  fashion.  Typically,  connected 
word  recognition  is  used  with  the  digits  for  entering  number 
sequences  such  as  telephone  numbers.  Connected  word  recognition. 


0: 

high 

speed  voice  input  capabili 

tv  as  it  is  some tin 

,ts  called, 

1  3 

just 

beginning  to  be  available 

c on.m ercially  at  a 

reasonable 

pri 

ce . 

Connected  were  recognition 

capabilities  are 

still  quite 

constrained  with  respect  tc  the  number  and  type  cf  words  that  can 
be  recognized  in  this  manner.  Continuous  word  recognition 
implies  the  capability  to  input  an  unconstrained  number  of  words 
in  a  continuous  manner  (like  conversational  speech).  Do 
commercially  available  system  yet  has  this  capability,  and  it 
will  probably  not  be  available  in  the  near  future.  Ecth 
connected  and  continuous  speech  recognition  are  more  difficult  to 
achieve  than  discrete  word  recognition  because  of  two  related 
problems.  First,  when  speech  flows  freely  in  connected  form, 
word  boundaries  are  extremely  hard  to  detect.  Second,  words 
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enormous  task  cf  sorting  through  the  ccrr.pl  e:<i  ties  c  f 
ccr.versaticnal  speech  by  limiting  the  task  to  the  recognition  of 
connected  digit  strings  and  to  structured  con-rand  sequences. 
This  structured  command  language  is  incorporated  into  a  system  by 
the  use  cf  syntax,  which  represents  all  the  valid  word  sequences 
that  constitute  commands  to  an  ASP.  system.  Syntax  structures 
limit  the  number  of  possible  words  for  recognition  -  these  which 
are  valid  at  that  point  in  the  command  sequence.  Fcr  example, 
syntax  structures  night  be  used  to  aic  ASR  in  the  cockoit  fcr  a 


the  word  "radio"  and  then  lock  for  a  string  of  digits.  However, 
the  recognizer  would  not  lcox  for  any  "nav"  functions.  The  clever 
use  cf  syntax  structures,  therefore,  limits  the  number  of  active 
word  choices  at  each  point  in  the  command  sequence.  This  method 
is  ilearly  more  efficient  than  choosing  among  all  the  woras  in 
the  vocabulary  at  all  times. 
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PERFORMANCE  MEASUREMENT 


There  are  two 

typ 

es  of  errors 

associated  with  ASR 

devices . 

Substitution  errors 

or 

misses  ccmpr 

ise  the  incorrect  recognition 

of  an  utterance. 

For 

example , 

the  user  says  "TUNE" 

and  the 

machine  recognizes 

the 

word  "SLEW." 

This  type  of  error 

is  by  far 

the  most  critical  in  the  aircraft  environment. 

The  second  type,  rejection  errors,  occur  when  an  incoming 
utterance  fails  to  match  any  of  the  reference  templates  in 
memory.  Most  commercially  available  ASR  devices  have  a  user 

selectable  rejection  threshold.  This  threshold  dictates  the 
number  of  bits  that  must  match  between  an  incoming  utterance  and 
a  reference  template  for  recognition  to  occur.  A  trade-off 
occurs  when  selecting  a  rejection  threshold.  With  a  stringent 
setting  few, if  any, substitution  errors  will  occur  at  the  expense 
of  increased  utterance  rejections.  Thus,  the  user  may  have  to 
repeat  a  word  several  times  for  classification  to  o<~cur.  With 
less  stringent  rejection  threshold  settings,  the  machine  will 
attempt  to  classify  all  utterances,  thereby  increasing 

substitution  errors  with  a  concurrent  decrease  in  rejections.  An 
optimal  rejection  threshold  is  one  in  which  substitution  errors 
are  virtually  eliminated  while  rejections  are  kept  to  a  minimum. 
Although  substitution  errors  are  clearly  the  less  desirable  of 

the  two  types  of  errors,  the  need  to  repeat  an  utterance 

equently  can  be  extremely  annoying. 

A  standardized  performance  metric  for  the  various  ASR 
devices  has  yet  to  be  accepted.  There  is  currently  no  generally 
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accepted  way  to  weight  the  relative  seriousness  of  a  substitution 
error  as  opposed  to  a  rejection  error.  Furthermore,  a  standard 
method  for  comparison  of  ASR  devices  has  yet  tc  be  adopted. 
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SPEECH  GENERATION 

DIGITISED  VS.  SVNTHE SIDED  SPEECH 

Speech  generation  can  be  accompl Lshed  in  several  ways. 
Digitized  speech  is  produced  by  converting  analog  speech  signals 
to  digital  wave  form.  The  computer  records  the  waveform  by 
sampling  the  signal's  voltage  periodically  through  an  analog  to 
digital  (A/D)  converter  and  then  stores  it  as  a  binary  value. 
The  resulting  binary  data  is  then  stored  until  needed  at  which 
time  the  original  waveform  is  recreated  by  sequentially  sending 
the  stored  values  to  a  digital  to  analog  converter  (D/A)  at  the 
same  rate  as  the  original  sampling. 

There  is  a  trade-off  involved  with  digitizing  speech.  The 
bit  density  used  to  recreate  the  speech  can  be  raised  or  lowered. 
Lowering  the  bit  density  obviously  takes  up  less  memory  but  the 
quality  of  speech  is  also  degraded.  Raising  tne  bit  rate 
improves  the  quality  of  the  speech  until  it  is  nearly 
indistinguishable  from  analog  recorded  human  speech  but  at  the 
cost  of  a  large  amount  of  memory.  Therefore,  the  user  must 
decide  on  an  appropriate  compromise  for  a  particular  application. 

Speech  synthesis,  another  type  of  speech  generation, 
typically  employs  a  synthesis-by-rule  scheme  using  formant- 
resonators.  A  formant  resonator  speech  synthesizer  models  the 
human  vocal  tract  and  can  reproduce  the  approximately  40  phonemes 
which  comprise  the  English  language.  Phonemes  may  be  defined  as 
the  set  of  the  smallest  units  of  speech  that  distinguish  one 
utterance  or  word  from  another  in  a  given  language.  High  quality 
speech  synthesis  is  dependent  on  how  well  transitions  from  one 


phoneme  to  another  are  handled,  eg.  from  vowel  to  consonant  and 


consonant  to  vowel.  Furthermore,  accuracy  of  the  timing  of  the 
generated  phonemic  segments  also  contributes  to  the  quality  of 
the  synthetic  speech.  Finally,  the  phonetic  accuracy  of  the 

segments  of  speech  are  crucial  to  the  production  of  high  quality 
speech  synthesis. 

Text-to-speech  rules,  when  used  in  conjunction  with  a  speech 
synthesis  technique,-  provide  the  user  with  real-time  unlimited 
word  production  capabilities.  Currently  the  text-to-speech 

software  needed  to  produce  unlimited  speech  generation 

capabilities  requires  approximately  16k  of  memory.  Text-to- 
speech  algorithms  are  a  hierarchical  set  of  linguistic  rules 
and  are  entirely  software  based.  When  these  rules  are  imposed  on 
a  particular  synthesis  technique,  they  provide  the  means  whereby 
individual  phonemes  may  be  concatenated  to  produce  realistic 
sounding  speech. 

The  quality  of  speech  synthesis  when  coupled  with  text-to- 
speech  rules  is  dependent  not  only  on  how  well  the  synthesis  is 
executed  but  also  on  the  particular  linguistic  rules  which 
comprise  the  text-to-speech  software.  Since  no  standards 
pertaining  to  these  rules  have  been  created,  they  can  be  more  or 
less  accurate  phonetically  depending  upon  the  manufacturer 
(Simpson,  1983 ) . 

In  essence,  the  quality  of  synthesized  speech  is  contingent 
upon  both  the  hardware  and  software  used  to  generate  the  speech. 
No  one  synthesis  technique  is  intrinsically  better  than  another. 
Rather,  a  particular  technique's  success  or  lack  thereof  is 
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dependent  upon  hew  well  it  is  executed  (Simpson,  1983). 


Current 


speech  synthesis  technology  tends  to  produce  rather  mechanical 
sounding  speech.  Listeners  will  often  perceive  a  foreign  accent 
in  the  speech  produced  by  a  synthesizer.  This  appears  to  be 
attributable  to  the  fact  that  the  rules  that  govern  human  speech 
code  are  very  complex  and  the  fact  that  not  all  of  these  rules 
are  known  at  this  time. 

Tcaay's  speech  generation  technology,  both  digitization  and 
synthesis,  share  a  common  weakness  in  determining  the  placing  of 
articulation  features  for  consonants .  Further  research  is  needed 
to  determine  exactly  what  speech  cue  makes  us  hear  the  place  of 
articulation . 

PERFORMANCE  MEASUREMENT 

Typically,  intelligibility  is  used  as  the  standard 
performance  measure  of  both  digitized  and  synthesized  speech. 
There  is  a  tendency,  however,  to  measure  intelligibility  based  on 
single  words  ^poken  in  isolation,  thereby  eliminating  any 
contextual  cues  that  may  aid  in  overall  comprehensibility.  Since 
human  communications  are  rarely  conducted  in  an  isolated  word 
fashion,  a  more  realistic  performance  metric  might  be  one  in 
which  intelligibility  is  measured  for  phrases,  sentences,  or  some 
meaningful  word  group. 
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AUTOMATIC  SPEECH  RECOGNITION  IN  THE  FLIGHT  ENVIRONMENT 

Although  the  pilot  flying  a  high  workload  mission  stands  to 
gain  tremendously  from  the  use  of  voice  command,  the 
environmental,  physical-  and  emotional  factors  impinging  upon  the 
pilot  make  speech  recognition  difficult  to  achieve  reliably  in 
the  flight  environment.  Noise,  vibration,  stress,  fatigue,  and 
workload  all  act  upon  the  pilot  throughout  any  mission.  These 
environmental  and  human  effects  manifest  themselves  to  the 
speech  recognition  device  as  radically  varying  speech  patterns 
for  any  given  word  in  the  operational  vocabulary.  Although 
problems  such  as  r.oise  and  user  stress  and  fatigue  are  not 
unique  to  the  cockpit  application  of  ASR  technology,  they  are 
intensified  and  their  effects  are  perhaps  more  critical  than  in 
industrial  or  office  environments.  However,  the  need  to  aid  the 
pilot  in  his  increasingly  demanding  job  has  motivated 
considerable  research  directed  towards  overcoming  these  problems. 
In  the  following  section  many  of  these  factors  will  be  examined. 

AMBIENT  NOISF 

A  major  problem  associated  with  the  use  of  ASR  in  the  flight 

environment  concerns  ambient  cockpit  noise  and  the  creation  of 

reference  templates.  Should  an  on  board  ASR  system  (either 

speaker  dependent  or  independent)  be  trained  in  the  presence  of 

ambient  cockpit  noise,  or  will  reference  templates  created  in  the 

presence  of  no  noise  be  adequate  for  use  in  flight? 

Research  conducted  at  NASA-Ames  Research  Center  (Coler,  Plummer 
& 
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Hutf,  1983;  Kersteen,  1982)  indicates  that  when  an  isolated  word 
ASR  system  is  trained  in  a  quiet  environment  and  recognition  is 
then  attempted  using  these  training  templates  in  the  presence  of 
noise  (95-100  d3A  of  helicopter  noise),  obtained  recognition 
accuracy  rat’s  are  quite  lew  (78%).  Conversely,  if  the  system  is 
trained  in  the  presence  of  background  noise  and  recognition  is 
conducted  in  that  same  ambient  noise  level,  accuracy  rates  are 
quite  high  (97%).  These  results  are  attributable  to  the  fact 
that  when  training  occurs  in  a  relatively  quiet  environment  and 
recognition  then  takes  place  in  the  presence  of  noise,  the 
training  templates  simply  do  not  reflect  the  noise  component. 
Thus,  the  match  between  the  templates  and  the  incoming  utterance 
is  poor,  yielding  low  levels  of  recognition  accuracy. 

Obviously,  the  need  to  create  reference  templates  by 
iterating  the  entire  operational  vocabulary  several  times  during 
flight  is  both  distracting  and  annoying  to  the  pilot.  There  are, 
however,  several  possible  solutions.  First,  an  algorithm  that 
continually  samples  background  noise  and  incorporates  this  noise 
into  the  reference  template  may  alleviate  the  problem.  However, 
there  is  currently  no  algorithm  that  can  update  the  templates 
fast  enough  to  keep  up  with  rapidly  changing  ambient  cockpit 
noise  levels.  Second,  simulated  cockpit  noise  may  provide  enough 
fidelity  that  a  pilot  could  create  adequate  reference  templates 
on  the  ground  in  the  presence  of  this  simulated  noise.  These 
templates  would  then  be  loaded  into  the  aircraft  avionics  suite 
for  use  in  flight  along  with  other  specifics.  Finally,  the  use 
of  better  sound  proofing  materials  in  the  cockpit  may  reduce 
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noisa  to  an  acceptable  operational  level  for  an  ASR  device  in 
future  rotorcraft. 

UPDATING  RETERENCE  TEMPLATES 

A  second  rr.ajoi  problem  relates  to  the  length  of  time  one  set 
of  reference  templates  can  be  used  before  retraining  is  needed 
since  speech  patterns  change  with  time,  stress,  and  fatigue. 
Does  the  pilot  need  to  'rain  the  ASR  system  prior  to  every  flight 
or  will  one  set  of  reference  templates  be  valid  for  a  week  or  a 
month  given  that  the  vocabulary  dees  not  change?  Furthermore, 
will  the  pilot  need  to  retrain  the  system  on  some  words  during 
the  course  of  a  mission?  The  effects  of  stress  and  fatigue  on 
speech  characteristics  are  more  difficult  to  isolate  because  they 
can  operate  either  singly  or  in  combination  on  the  pilot.  Stress 
levels  are  likely  to  vary  drastically  during  the  course  of  a 
given  mission.  Does  this  mean  that  during  times  of  high  stress, 
incoming  recognition  utterances  will  be  so  different  that 
accurate  recognition  can  not  occur?  Once  again,  an  algorithm  that 
updates  the  reference  templates  not  only  with  background  noise 
characteristics  but  also  with  changing  speech  pattern 
characteristics  may  help  solve  this  problem.  Clearly,  more 
research  pertaining  to  the  effects  of  time,  stress,  and  fatigue 
on  speech  patterns  is  needed. 

STORAGE  MEDIA 

A  more  technical  issue  related  to  the  use  of  ASR  in  flight 
concerns  the  best  storage  media  for  the  reference  templates  for 
the  flight  environment.  A  variety  of  storage  devices  are 
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available,  such  as  magnetic  tape,  bubble  memory,  etc. 
Furthermore,  it  is  possible  that  magnetic  strips  like  those  found 
on  credit  cards  may  become  available  for  the  storage  of  reference 
templates.  Whatever  device  is  chosen  for  the  cockpit 
application,  it  must  be  compact,  lightweight,  non-volatile,  heat 
and  shock  resistant,  and  longlasting. 

ACTIVATION  OF  THE  VOICE  SYSTEM 

To  use  voice  command  in  the  cockpit,  there  must  be  some  way 
to  activate  the  speech  recognition  system.  There  are  several 
alternatives  for  accomplishing  this  task;  however,  little  or  no 
research  has  addressed  which  alternative  is  the  safest,  most 
acceptable,  and  least  obtrusive.  One  alternative  is  to  install  a 
push-to-talk  switch  in  the  cockpit.  The  pilot  would  have  to 
activate  this  switch  with  each  input  to  the  recognizer.  Another 
alternative  would  be  to  leave  the  device  in  a  continual  ready 
mode,  with  the  hope  that  accidental  activation  does  not  occur. 
Finally,  the  device  could  be  left  in  the  ready  mode,  waiting  for 
a  key  word  which  signals  the  device  to  prepare  for  input. 

COMMA NT  LANGUAGE 

It  has  already  been  mentioned  that  connected  speech 
recognition  capabilities  are  becoming  commercially  available. 
These  capabilities  will  probably  be  expanded  beyond  the  currenc 
ability  to  recognize  connected  digits  by  the  mid  1990'u 
timeframe.  Connected  word  recognition  capabilities  (as  opposed  to 
isolated  word  recognition)  are  clearly  needed  in  the  cockpit  if 
workload  is  to  be  reduced,  rather  than  increased,  with  voice 
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command.  The  nature  of  the  command  language  and  syntax 

structure  used  hatween  human  and  aircraft  deserves  considerable 
attention.  It  is  crucial  that  the  command  language  be  as  natural 
for  the  pilot  as  possible.  More  specifically,  pilots  will  accept 
and  learn  a  language  using  "pilot  jargon"  more  easily  than  an 
unnatural  command  language.  Additionally,  command  sequences  to 
an  ASR  device  that  capitalize  upon  the  way  a  pilot  normally 
interacts  with  another  crewmember  will  be  learned  and  remembered 
better.  The  naturalness  of  the  command  sequence  will  become 
critical  during  times  of  high  workload  when  the  pilot  has  little 
available  capacity  to  remember  a  given  command  sequence. 
Furthermore,  the  command  language  and  syntax  structure  must  be 
flexible  enough  that  the  pilot  can  express  a  command  to  the  ASR 
device  in  any  of  several  ways.  Again,  this  capability  will 
reduce  any  additional  cognitive  burden  associated  with 
remembering  a  specific,  rigid  command  sequence.  In  essence  the 
command  language  used  in  a  cockpit  should  be  designed  to  reduce 
rather  than  increase  the  pilot's  cognitive  load. 

RESEARCH  ISSUES 

Table  1  summarizes  the  research  issues  concerning  the  use  of 
ASR  in  the  helicopter  cockpit 
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TABLE  1 


I 


Automatic  Speech  Recognition  Research  Issues 


i.  How  should  the  degrading  effects  of  background  noise  on  ASR 
accuracy  be  dealt  with  in  the  cockpit? 


2.  How  long  can  reference  templates  be  stored  and  then  used  with 
acceptable  recognition  accuracy  rates? 


3.  What  effects  do  stress  and  fatigue  have  on  speech  patterns 
and  hence  on  ASR  accuracy? 


4.  If  a  reference  template  requires  updating  or  retraining 
during  flight,  how  should  this  be  accomplished  and  hew  should  the 
pilot  be  made  a^.are  of  this  requirement  without  disrupting 
orimary  flight  tasks? 


5.  What  storage  media  for  the  reference  templates  will  be  best 
for  the  flight  environment? 


6.  What  is  the  best  way  to  activate  the  ASR  device  and  prepare 
it  for  input? 


7.  If  a  connected  word  recognizer  is  used,  how  should  the 
command  language  between  the  pilot  and  aircraft  be  structured? 


SPEECH  GENERATION  IN  THE  FLIGHT  ENVIRONMENT 


Speech  generation  has  been  considered  for  two  main 
functions  in  the  cockpit:  for  conveying  caution,  warning  and 
alert  type  messages  and  as  a  prompt  or  feedback  response  to  voice 
recognition  input.  Voiced  alert  messages  in  the  cockpit  have 
been  in  existence  for  a  number  of  years  now.  There  are  two 
advantages  of  this  capability.  First,  it  alerts  or  warns  the 
pilot  without  diverting  visual  attention.  Furthermore,  voiced 
alerts  or  warnings  convey  more  information  than  traditional 
bells,  buzzers,  tones,  etc.  Voice  warnings  have  also  been 
suggested  for  articulating  system  failures  and  threat  detection 
messages  in  the  LHX  cockpit. 

SYNTHESIZED  VS  DIGITIZED  SPEECH 

For  the  aircraft  cockpit,  synthesized  speech  is  more 
flexible  than  digitize  '  speech.  Furthermore,  a  synthetic 
speech-by-rule  system  does  not  have  the  vocabulary  limitations 
that  are  found  in  a  digitized  speech  system.  With  digitized 
speech,  every  word  needed  for  an  application  must  be  identified, 
digitized.,  and  then  stored.  Synthesis  systems  have  virtually 
unlimited  vocabulary.  Digitized  speech  systems  pose  two  problems 
for  an  aircraft  application:  they  limit  flexibility  in  that  the 
number  ^f  usable  words  is  fixed,  and  vocabulary  size  must  be  kept 
at  a  minimum  or  memory  requirements  and  access  time  becomes 
unacceptable . 

By  virtue  of  the  fact  that  synthesized  speech  sounds 
mechanical,  it  works  well  as  a  voice  warning  system  since  it 
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s  tar.-is  cut.  against  the  background  radio  commur. i ca  t  j  on  s  ongoing  in 
the  cockpit.  Simpson  (1550  purports  that  a  high  fidelity 
representation  of  human  speech  enunciating  a  warning  message 
might  very  easily  bland  with  ether  ongoing  cockpit 
communications ,  whereas  a  more  mechancial  sounding  speech  will 
stand  out. 


X  .  .  1  1.  i.,  ^ 


:g:i?::.ity  cf  synthesised  speech 


An  important  consideration  in  the  integration  of  speech 
synthesis  in  the  cockpit  relates  to  its  intelligibility. 
Several  researchers  present  evidence  suggesting  that  rule 
generated  synthetic  speech  may  be  less  intelligible  than  natural 
speech  or  speech  digitized  at  a  high  data  rate.  Using  a  MITalk 
unrestricted  text-tc-speech  synthesizer,  Fiscni  and  Hunr.icutt 
(1930)  found  that  phoneme  recognition  for  synthetic  speech  was 
93.13  compared  to  99.-43  for  natural  speech.  These  researchers 
concluded  that  the  difficulties  observed  in  the  perception  and 
comprehension  cf  synthetic  speech  are  due  to  increased  processing 
demands  in  short-term,  memory. 

An  alternative  explanation  might  be  that  the  decrease  in 
performance  associated  with  synthetic  speech  is  due  to  a  lack  of 
familiarity  with  its  distinctive  "accent".  In  other  words,  the 
intelligibility  of  synthetic  speech  might  be  no  less  than 
listening  to  a  person  speak  with  a  foreign  accent.  The  point  to 
be  made  here  is  that  there  may  be  nothing  inherent  in  synthetic 
speech  that  makes  it  less  intelligible  than  natural  speech.  In 
fact  it  may  be  more  accurate  to  regard  the  two  as  points  on  a 
continuum  rather  than  as  two  separate  entities.  T*ie 
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intelligibility  of  hurt  art  speech  varies  with  the  listener's 
familiarity  with  the  accent  as  does  the  intelligibility  of 
synthetic  speech.  Clearly,  further  research  with  respect  to  the 
issue  of  training  and  familiarity  as  it  relates  to  the 
intelligibility  of  synthetic  speech  is  needed  prior  to  its 
integration  in  the  cockpit. 

A  related  issue  is  the  need  to  compare  the  intelligibility 
and  comprehensibility  of  various  commercially  available  speech 
synthesis  devices  among  themselves,  rather  than  continue  to 
compare  human  speech  with  one  particular  brand  of  speech 
synthesis.  The  comparison  cf  human  speech  and  synthesized  speech 
has  r.o  point  cf  reference  if  a  baseline  has  r.ot  been  established 
for  the  differential  intelligibility  of  the  various  commercially 
available  speech  synthesis  devices. 

SPEECH  PITCH  AND  RATE 

In  addition  to  the  unlimited  vocabulary  capability  provided 
by  text-to-speech  synthesis  techniques,  almost  all  speech 
synthesizers  have  adjustable  speech  pitch  and  rate  capabilities. 
Though  these  additional  capabilities  provide  fJexioility  to 
the  user  or  system  designer,  their  interactive  and/or  additive 
effects  on  intelligibility  and  comprehension  need  to  be 
considered.  Simpson  and  Marchionda-Frost  ,1983)  conducted  a 
study  which  addressed  the  effects  of  speech  pitch  and  rate  in 
th  ~  presence  of  85  dBA  of  simulated  helicopter  noise.  These 
experimenters  hypothesized  that  synthesized  speech  with  a 
fundamental  frequency  above  the  frequency  range  of  the  highest 
amplitude  octave  band  of  the  background  noise  would  be  correctly 
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perceived  more  often  then  speech  with  a  fundamental  frequency 
with  the  same  octave  band  of  background  noise.  This  hypothesis 
was  baseo  on  the  assumption  that  background  noise  of  the  same 
fundamental  frequency  would  mask  certain  perceptual  features  of 
the  synthesized  speech  warning  thereby  causing  a  degradation  in 
intelligibility.  Although  this  hypothesis  was  not  supported  by 
the  data,  pitch  of  the  synthesized  speech  warning  should  not  be 
disregarded  in  further  research.  It  is  possible  that  the  type  of 
noise  used  (simulated  helicopter  noise)  or  the  rather  unrealistic 
loudness  variability  may  have  contributed  to  this  variable's 
failure  to  reach  significance. 

With  respect  to  speech  rate,  Simpson  and  Marchicnda  (1983) 
hypothesized  that  increasing  the  rate  at  wh’ch  a  message  is 
presented  (thereby  decreasing  the  amount  of  time  taken  by  the 
message  itself)  will  reduce  comprehension  time.  The  elimination 
of  redundant  words  from  the  message  was  also  noted  as  a  means  of 
reducing  the  temporal  length  of  the  message.  However,  this 
method  was  disregarded  since  previous  research  suggests  that  this 
technique  tends  to  decrease  intelligibility  and  increase  response 
time  presumably  because  linguistic  redundancy  is  an  important 
perceptual  feature  cf  speech. 

Interestingly,  results  indicate  that  increasing  the  ■  peech 
rate  to  173  words  per  minute  (KPM)  (maximum  number  of  wpm  tested) 
had  no  degrading  effect  on  intelligibility  and  apparently  reduced 
the  time  taken  to  comprehend  the  message.  However,  subjects 
(who  were  also  pilots)  indicated  a  preference  for  messages 
presented  at  a  slightly  slower  rate  of  156  wpm.  At  the  fastest 
presentation  rate  (178  wpm)  some  subjects  indicated  that  they 
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feared 


missing  parts  of  the  message. 


The  subjects  also  stated 


that  the  slew  message  rate  (123  wpm)  diverted  their  attention 
from  the  primary  flight  task  because  it  tcck  so  long. 

This  research  has  a  number  of  implications.  The  effects  of 
synthesized  voice  pitch  on  intelligibility  and  comprehension 
deserves  further  research  perhaps  in  a  more  realistic  noise 
environment.  The  use  of  compressed  speech  has  been  suggested  for 
use  in  the  cockpit.  Humans  can  process  as  many  as  300  words  per 
minute  with  sufficient  training  particularly  if  the  information 
conveyed  is  expected  by  the  listener  and  higr.ly  redundant. 
Voiced  warnings  ana  alerts  in  the  cockpit  are  neither  redundant 
nor  expected.  Furthermore,  the  pilot  .will  be  performing  numerous 
other  concurrent  tasks  while  listening  to  voice  warnings.  It  is 
likely  that  the  use  of  compressed  speech  will  increase  rather 
than  decrease  the  pilot's  cognitive  load.  Furthermore,  the 
temporal  savings  in  reduced  message  length  will  probably  no*- 
offset  the  cost  in  increased  intelligibility.  Conversely, 
synthesized  voice  messages  presented  at  an  unnaturally  slow  rate 
should  be  avoided  in  the  cockpit  since  they  appear  to  divert 
unnecessary  amounts  of  attention. 

INFLECTION  RATE  AND  AMPLITUDE  OF  SYNTHESIZED  SPEECH 

Filtering  techniques  will  soon  become  available  with  speech 
synthesizers  that  will  allow  the  user  to  change  the  inflection 
rate  and  amplitude  of  the  synthesized  speech.  This  capability 
will  permit  a  single  speech  synthesizer  to  produce  different 
types  of  voices.  The  implication  for  the  a  cockpit  application 
is  the  possibility  of  using  different  synthesized  voices  for 
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different  types  of  tasks  ir.  the  cockpit.  For  example.-  changes  in 
the  amplitude  of  the  synthesized  voice  warning  could  convey 
additional  information  as  to  the  urgency  of  the  warning  ie.  the 
leader  the  warning  the  more  urgent.  However,  in  implementing  a 
display  design  such  as  this,  the  amplitude  must  be  regulated  so 
that  the  loudest  warning  does  not  overpower  other  cockpit 
communication.  Conversely,  the  amplitude  of  the  warning  must  not 
itself  be  overpowered  by  ambient  cockpit  noise.  Clearly, 
additional  research  concerning  the  perceptual  implications  of 
these  variables  for  a  cockpit  application  is  needed,  particularly 
because  they  hold  premise  for  enriching  synthesized  speech  with 
more  linguistic  cues. 

PRIORITIES  OF  VOICED  MESSAGES ,  ALERTS, ' AMD  WARNINGS 

Given  that  voice  warnings  are  and  will  be  used  in  the 
cockpit,  a  method  must  be  adopted  whereby  these  warnings  can  be 
assigned  a  priority  in  the  event  that  several  warnings  need  be 
conveyed  simultaneously.  On  the  assumption  that  one  message  can 
be  presented  at  a  time,  the  most  important  one  must  be  relayed 
to  the  pilot  first.  Less  important  messages  must  be  queued  with 
respect  tc  their  urgency  and  then  displayed  following  the  pilot's 
acquisition  of  the  most  urgent  message. 

REPETITION  OF  VOICED  INFORMATION 

Related  to  the  issue  of  setting  priorities  for  voiced 
warning  messages  is  the  number  of  times  a  warning  should  be 
repeated  to  insure  acquisition  by  the  pilot.  This  issue  can  be 
approached  in  several  ways;  the  message  could  repeat  for  a  fixed 
interval  of  time,  the  pilot  could  turn  it  off,  or  the  message 
could  repeat  until  the  problem  was  solved.  In  a  study 
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which  specifically  addressed  cockpit  voice  warnings  for  air 
transport  operations,  Williams  and  Simpson  ',1976)  reported  that 
pilots  prefer  a  cancel  button  to  deacti.vate  voice  warnings  at 
their  discretion,  especially  if  the  warning  is  of  high  priority 
(demands  immediate  attention).  Alternatively,  a  spoken  command 
could  also  be  used  to  end  a  warning.  This  study  also  revealed 
that  pilots  preferred  to  have  other  less  critical  warnings 
presented  on  a  subsidiary  display  such  as  a  CRT. 

Not  all  of  the  messages  presented  to  the  pilot  via  speech 
synthesis  will  be  of  a  mission-critical  nature  in  the  LHX. 
Speech  displays  may  also  be  used  to  present  information  on 
request  from  the  pilot.  Regardless  of  the  nature  of  the 
inforiaation  ,  since  speech  is  by  nature  temporally  restricted,  a 
visual  replica  of  the  auditory  information  should  be  provided  to 
the  pilot  for  later  reference.  In  fact  certain  types  of 
information  could  be  presented  to  the  pilot  in  hard  copy  format 
in  conjunction  with  the  auditory  presentation.  This  approach  is 
well  suited  to  information  needed  which  will  be  referred  back  to 
later  by  the  pilot  during  the  course  of  a  mission.  Specifically, 
weather  and  navigation  information  is  well  suited  for  hard-copy 
presentation . 

RESEARCH  ISSUES 

Table  2  contains  a  summary  of  the  research  issues  related  to 
the  use  of  speech  synthesis  in  the  cockpit. 
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TABLE  2 


SPEECH  SYNTHESIS  RESEARCH  ISSUES 


1.  What  are  the  effects  of  training  and  familiarity  on  the 
intelligibility  of  synthesized  speech? 


2.  How  do  the  various  commercially  available  speech  synthesis 
devices  compare  with  each  other  in  comprehensibility  and 
intelligibility? 


3.  How  dees  the  pitch  of  the  synthesized  speech  effect 
intelligibility  in  the  presence  of  actual  helicopter  noise? 


4.  What  is  the  differential  intelligibility  and 
comprehensibility  of  different  voice  types  provided  by  a  single 
speech  synthesis  technique? 


5.  Is  there  an  appreciable  gain  in  information  transmitted  when 
several  different  voice  types  are  used  as  opposed  to  just  one? 


6.  Do  several  voice  types  complicate  rather  than  simplify  the 
pilot's  task? 


7.  Do  voice  messages,  alerts,  and  warnings  need  to  be  assigned 
priorities?  If  so,  what  is  the  optimum  way  to  assign 
priorities? 

8.  How  many  times  should  a  voiced  warning  be  repeated? 


8.  How  should  voiced  messages  be  terminated  by  the  pilot? 


1C.  Should  there  be  a  visual  back-up  display  for  an 
display  of  information  to  the  pilot? 


auditory 
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Is  there  any  inf orma cion  that  should  be  presented  to 
pilot  in  hard  copy  (printout)  format  as  opposed  to  soft 
(CRT)  or  auditory? 


the 

copy 
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FUNCTIONAL  DESCRIPTION  OF  THE  LHX  AVIONICS  SUITE 

The  primary  reason  for  the  Army's  development  of  the  LHX 
family  of  light/scout  attack  helicopters  has  been  the  need  for  an 
all  weather  aircraft  with  day/night  capabilities.  The  LHX  also 
is  being  designed  for  defense  of  Army  aviation.  Mission 

requirements  will  demand  a  considerable  amount  of  nap-of-the- 
earth  (NOE)  type  flying,  in  which  the  helicopter  is  flying  low 
and  fast  and  avoiding  obstacles.  The  most  outstanding  ana 
challenging  aspect  of  the  LHX  from  a  human  factors  design  point 
of  view  is  the  Army's  desire  to  limit  the  operation  of  this 
aircraft  to  a  single  crewmember.  Current  attack  helicopter 
missions  require  both  a  pilot  and  co-pilot.  The  co-pilot,  seated 
in  front  of  the  pilot,  performs  various  weapon  related  furcticns 
and  relays  verbal  navigation  commands  to  the  pilot  whose  primary 
task  is  manual  control  of  the  helicopter.  Even  with  two  crewmem¬ 
bers,  workload  is  often  quite  high,  especially  during  critical 
attack  mission  segments  when  simultaneous  target  detection  and 
weapon  release  and  control  functions  are  occurring.  Clearly,  the 
development  of  a  single  pilot  cockpit  will  rely  heavily  on  higher 
levels  of  task  automation  than  currently  exist. 

LHX  mission  functions  can  be  generalized  into  four  major 
roles  for  the  pilot:  flight,  offense,  defense,  and  mission 

management.  Since  the  pilot  can  only  fill  one  of  these  roles  at 
a  time,  the  other  roles  must  be  automated  to  avoid  overloading 
him.  This  implies  that  the  avionics  system  must  allow  the  pilot 
to  perform  whatever  task  is  primary  at  the  moment  and 
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automatically  perform  the  secondary  tasks.  These  requirements 
are  necessitating  design  of  the  LHX  based  on  advanced  technology, 
some  of  which  may  not  yet  be  available.  The  avionics 

architecture  will  employ  an  array  of  sophisticated  sensors  and 
advanced  concepts  in  integrating  and  controlling  these  sensors. 
The  Army's  desire  for  a  one-man  crew,  coupled  with  the  new  and 
expanded  mission  capabilities,  increases  the  need  for  innovative 
design  of  display  and  control  modes  for  the  pilot,  as  well  as 
more  automation. 

As  outlined  in  Honeywell's  report  to  the  Army  Aviation 
Research  and  Development  Command  (conducted  under  DAAK50-81-C- 
0038 )  the  primary  subsystems  which  comprise  the  current  LHX 
avionics  suite  are: 

1 )  Navigation 

2)  Target  Acquisition  and  Attack 

3)  Flight  Control 

4)  Communication 

5)  Threat  Defense 

6)  Data  Management 

7)  Control  and  Display 

The  success  of  the  LHX  will  depend  upon  the  design  of  the 
control  and  display  subsystem  since  this  subsystem  provides  the 
pilot/aircraft  interface.  No  amount  of  technology  will  make  this 
aircraft  fully  operational  unless  a  prior  determination  is  made 
as  to  the  type  of  information  the  pilot  will  need  during  various 
mission  segments  and  the  rate  and  sense  modality  in  which  this 
information  should  be  transferred  between  the  pilot  and  the 
aircraft.  In  an  effort  to  facilitate  this  inf ormation  transfer 
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function  between  pilot  and  aircraft,  the  following  concepts  are 
being  considered  for  integration  into  the  LHX : 

1)  No  windows.  Due  to  the  problems  associated  with  infra¬ 
red  radar  signature,  windows  may  be  essentially  eliminated  from 
the  LHX  cockpit.  Thus,  a  wide  field  of  view  (60  by  160  degrees) 
wrap  around  display  will  be  used  for  pilotage  and  for  the  display 
of  flight  control,  targeting,  threat  detection,  and  fire  control 
symbology.  This  display  will  be  consistent  in  terms  of  symbology 
among  all  conditions  of  day,  night,  and  adverse  weather. 

2)  A  terrain  mapping  display.  For  further  navigation 
functions,  a  digital  terrain  mapping  display,  operating  from 
digital  terrain  data  bases,  will  provide  threat  and  battlefield 
information.  Upon  pilot  request,  this  computer  driven  display 
will  also  have  the  ability  to  plot  courses  between  known 
waypoints . 

3)  A  "display-by-excepticn"  concept.  This  will  be  used  for 
system  status  monitoring  in  which  information  will  be  presented 
to  the  pilot  only  if  it  is  mission  critical.  Unlike  current 
cockpit  design  in  which  the  pilot  must  scan  numerous  system 
status  instruments  continually  during  flight,  the  display-by¬ 
exception  design  will  lessen  the  need  for  the  traditional 
continuous  instrument  scan,  thereby  reducing  visual  workload. 

4)  Integrated  and  automated  systems.  These  will  be 
employed  in  an  effort  to  minimize  the  number  of  frequently 
executed  routine  operations  that  a  pilot  typically  performs. 

5)  Voice  technology.  Voice  interaction  with  the  various 
on  board  subsytems  will  be  used  in  this  aircraft  in  a  further 
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atterr.pt  to  reduce  pilot  workload  so  that  one  man  operation  is 
feasible.  Automatic  speech  recognition  (ASR)  will  be  used  as  an 
alternate  means  of  system  control  and  for  entering  and  receiving 
flight  information.  Speech  generation  will  be  used  an  an 
alternate  means  of  information  display. 

Speech  technology  has  been  recommenced  specifically  for  the 
following  functions  in  the  LHX: 

SPEECH  RECOGNITION 

1.  Automatic  target  recognizer  tasks 

2.  Sensor  (selection,  mode,  lock-cn) 

3.  Terrain  map  display  (request  updates) 

4.  System  monitoring  i request  information) 

SPEECH  GENERATION 

1.  Alert  and  warning  messages 

2.  Feedback 

Speech  technology  was  chosen  for  these  tasks  particularly  to 
enhance  performance  in  multiple-task  situations  where  visual 
monitoring  and  manual  control  of  critical  tasks  will  be 
important . 


32 


SPEECH  INTERACTION 

A  consideraDle  amount  of  applied  research  has  been  directed 
towards  the  use  of  speech  recognition  (speech  input)  as  an 
alternative  to  manual  keyboard  data  entry  and  speech  generation 
(speech  output)  as  an  alternative  to  the  visual  information 
presented  on  traditional  aircraft  anunciator  panels.  Optimal  use 
of  speech  technology  in  the  cockpit,  however,  will  be  in  an 
interactive  mode  where  speech  input  and  output  are  logically 
combined.  In  designing  a  truly  voice  interactive  system, 
attention  must  be  given  to  easing  pilot  visual  workload  while 
avoiding  pilot  auditoiy  overload. 

Voorhees,  Marchicnda ,  and  Atchison  (1982)  conducted  a 
study  in  which  they  assessed  the  use  of  speech  technology  in  a 
simulated  helicopter  NCE  environment.  Subjects  in  this  study 
performed  an  extremely  demanding  visual/manual  tracking  task. 
Crucial  airspeed,  altitude,  and  torque  information  was  presented 
to  them  in  one  of  three  ways.  One  group  of  subjects  received 
this  information  by  traditicnal  panel-mounted  instruments  (thus 
requiring  the  subjects  to  divert  attention  from  the  primary  task 
when  they  needed  such  information).  Another  group  of  subjects 
received  the  flight  information  in  the  form  of  thermometer-type 
gauges  that  were  arranged  on  the  periphery  of  the  CRT  on  which 
the  primary  task  was  displayed.  This  condition  simulated  a 
head-up  type  display.  In  the  third  condition  subjects  received  a 
visual  display  of  only  the  primary  task.  When  flight  information 
was  needed,  the  subjects  asked  for  :t  in  the  form  of  a  single 
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spoken  cc-^and ,  eg.  "airspeed",  "altitude",  and  "torque."  After 
computer  recognition  of  this  command,  synthesized  speech  feedback 
provided  the  necessary  information  to  the  subject.  In  this 
condition,  the  subject's  visual  attention  could  remain  on  the 
primary  task  at  all  times. 

Results  of  this  study  indicated  that  flight  performance  in 
the  voice  interactive  condition  was  significantly  better  than 
flight  performance  in  the  other  two  conditions.  This  study  is 
interesting  in  that  not  only  does  it  exemplify  the  merits,  in 
terms  of  improved  flight  performance,  of  using  the  auditory/vocal 
channels  as  a  means  of  acquiring  information  in  a  demanding 
flight  task.  It  also  suggests  that  although  Hl'Ds  eliminate  the 
need  for  the  pilot  to  scan  an  instrument  panel,  there  still  may 
be  seme  unwanted  diversion  of  visual  attention  associated  with 
the  use  of  these  displays. 

As  mentioned  earlier,  a  number  of  voice  tasks  have  been 
recommended  for  integration  in  the  LHX .  One  particular  subset 
of  LHX  functions  may  involve  both  speech  input  and  output  in  the 
use  of  an  automatic  target  recognizer  (ATR).  In  an  ongoing 
effort  to  develop  an  ATR  for  LHX  attack  and  scout  missions, 
Honeywell  has  designed  a  Prototype  Automatic  Target  Screener 
(PATS).  This  system  is  capable  of  sensing,  identifying,  and 
classifying  greund  targets  using  forward  looking  infra-red  ( FLIR ) 
or  day  TV  imagery.  In  conjunction  with  the  development  of  PATS, 
Mountford,  Schwartz,  and  Giaf funder  (1983)  identified  the 
following  pilot  interactions  with  PATS  that  lend  themselves  to 
speech  technology  implementation: 

1.  Enter  navigation  coordinates  for  recognizer  search  area 
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2.  Select  nodes  cf  PATS  cperaticr, :  search  and  designate 

3.  Request  display  of  another  detected  target 

4.  Modify  detection  confidence  criteria 

5.  Change  target  priorities 

6.  Assign  weapons  to  targets 

7.  Retrain/reinf crce  target  indentif ication  algorithm 

Mountford,  Schwartz,  and  Graffunoer  (1983)  created  a 

simulation  in  which  several  of  the  PATS  tasks  (1,2,3,  and  6)  were 
cctiDined  with  a  concurrent  tracking  task.  The  navigaticn- 
targeting-weapcn  selection  sequence  of  tasks  associated  with  PATS 
was  performed  repeatedly  according  to  the  following  three  task 
control,  interaction,  and  feedback  formats: 

Input  Modality  Feedback  Modality 

1.  Manual  Visual 

2.  Speech  Visual 

3 .  Speech  Speech 

The  overall  results  of  this  study  indicated  a  dual-task  (PATS 
and  tracking  tasks)  performance  advantage  for  speech-speecn  data 
input  as  opposed  to  manual-visual  data  entry.  Although  tracking 
performance  error  doubled  when  the  tracking  and  PATS  tasks  were 
performed  concurrently,  tracking  error  was  lower  when  speech 
input  and  output  were  used  interactively  than  when  speech-visual 
or  manual-visual  input  and  output  modalities  were  used  for  the 
PATS  task.  Mountford  et  al.  attribute  the  performance  advantage 
for  speech  input  and  output  to  the  freeing  of  visual  and  manual 
resources  so  they  can  be  dedicated  solely  to  the  tracking  task. 

Results  of  this  study  also  indicated  that,  particularly  for 
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the  navigation  tasks,  the  time  to  complete  this  task  was  greatest 
in  the  speech -speech  r.oda  1  i  ty .  This  result  is  net  surprising 
since  the  navigation  task  required  the  input  ct  strings  of  digits 
with  feedback  for  each  one.  The  time  handicap  fer  navigation 
digit  entry  using  speech  could  be  overcome  by  the  use  of  a 
connected  speech  recognizer,  which  would  allow  the  pilot  to 
string  the  digits  together  as  one  data  entry  as  opposed  to 
several  discrete  entries.  However,  since  speech  is  temporal  by 
nature,  the  additional  time  needed  for  the  articulation  of 
feedback  messages  is  inherent  to  this  mode  of  information 
transmission . 

Thus,  it  appears  that  there  is  experimental  evidence 
suggesting  that  speech  is  desirable  for  the  acquisition  of 
information  ir.  a  demanding  flight  task.  The  next  question  is  how 
should  this  voice  interactive  dialogue  between  human  and  machine 
be  designed?  Either  speech  or  manual  input  to  the  avionics  suite 
requires  verification  that  the  correct  input  was  received.  In  a 
non-critical  mission  segment,  visual  feedback  supplied  via  CPT 
may  be  adequate.  However,  during  mission  segments  in  which  heavy 
visual  demands  are  placed  upon  the  pilot,  auditory  feedback  will 
be  most  desirable,  as  will  voice  input.  Taken  a  step  further, 
structuring  Me  interactive  dialogue  between  the  pilot  and  the 
aircraft  will  be  facilitated  with  the  additional  capabilities 
proffered  by  speecn  input  and  output.  Currently,  information  has 
been  presented  visually  to  the  pilot  and  controlled  through 
multifunction  keyboards.  The  audition  of  speech  I/O  to  future 
cockpits  will  provide  complete  hands-off,  eyes-off  interaction 
with  various  on  board  systems. 
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In  structuring  this  dialogue  it  is  critical  that  the  user  be 
supplied  with  the  knowledge  that  previous  responses  have  been 
input  and  recognized  by  the  system  correctly.  Mountford,  North, 
Metz /  and  Warner  (1982)  have  examined  three  types  of  dialogues 
for  man/machine  communication  which  they  characterize  as 
"succinct",  "intermediate",  and  "verbose"  depending  upon  the 
wordiness  of  the  dialogue.  Results  of  this  study  indicate  that 
succinct  dialogues  are  preferable  to  the  more  verbose  dialogues 
primarily  because  they  require  less  involvement  from  the  pilot  in 
terms  of  time  and  attention.  This  work  highlights  the  importance 
of  keeping  aircraf t/pilot  interactions  brief  and  to  the  point. 

Furthermore,  interaction  between  pilot  and  aircraft  systems 
must  be  as  natural  for  the  human  as  possible.  One  of  the 
advantages  of  using  speech  as  a  mode  of  interaction  with  on  beard 
systems  (as  opposed  to  visual/manual  interaction)  is  that  speech 
is  the  most  natural  mode  of  communication  for  humans.  Efforts 
must  be  made  to  capitalize  on  this  naturalness  by  incorporating 
enough  flexibility  into  this  communication  link  so  the  pilot  can 
communicate  his/her  intentions  to  the  aircraft  in  much  the  same 
way  as  she/he  would  to  another  crewmember .  Conceptualizing  and 
creating  an  optimal  voice  interactive  dialogue  based  on  pilot- 
to-  cc -pilot  communications  will  necessarily  require 
considerable  thought  and  artificial  intelligence.  In  human 
communication,  specifically  pilot/co-pilot  communications,  a 
great  deal  of  intent  is  inferred  by  the  crewmembers  involved  in 
the  communication.  This  means  that  certain  things  are  done  or 
assumed  by  the  communicators  based  on  the  characteristics  of  the 
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given  situation.  This  implies  that  somehow  the  machine  must  be 
apprised  of  or  be  made  smart  enough  to  infer  certain  mission  and 
situation  specifics.  The  accomplishment  of  this  non-trivial  task 
will  provide  the  added  flexibility  characteristic  of  human 
communication  to  the  man/machine  communication  link  that  will 
begin  to  allow  full  realization  of  the  potential  for  speech 
technology  in  the  cockpit.  The  purpose  of  this  paper  is  not  to 
expound  upon  artificial  intelligence  and  its  many  cryptic 
interpretations.  Let  it  suffice  to  say  that  heightened  and 
continuing  research  in  this  area  will  be  highly  beneficial  to  the 
creation  of  this  very  important  communication  link  in  future 
i  generation  rotorcraft. 

An  issue  that  is  presently  under  debate  relates  to  whether 
the  pilot  should  be  provided  with  reversionary  controls  in  the 
event  of  a  voice  system  failure.  Should  there  be  a  manual 
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backup  for  tasks  that  have  been  allocated  to  voice  command;  and 
should  there  be  visual  backups  for  auditory  displays  of 
information?  Reversionary  controls  may  be  important  for 
psychological  as  well  as  technical  reasons.  Certain  situations 
may  arise  in  which  the  pilot  will  simply  feel  more  comfortable 
performing  a  task  manually  rather  than  verbally. 

RESEARCH  ISSUES 

Table  3  contains  a  summary  of  the  research  issues  related  to 
speech  interaction  in  the  helicopter  cockpit. 
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TABLE  3 


SPEECH  INTERACTION  RESEARCH  ISSUES 


1.  With  respect  to  the  structure  of  a  voice  interactive  system 
between  pilot  and  aircraft  (avionics  suite),  it  has  already  been 
established  that  succinct  dialogues  are  preferable  to  wordy 
dialogues.  What  other  general  rules  can  be  derived  to  govern  the 
integration  of  speech  interaction  m  the  cockpit? 


2.  Dees  the  pilot  need  reversionary  controls? 


3.  What  are  the  psychological  implications  of  not  providing 

the  pilot  with  reversionary  controls? 
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AUTOMATIC  SPEECH  RECOGNITION  TECHNOLOGY  ASSESSMENT 


The  specif ications  outlined  in  this  technology  assessment 
are  slanted  towards  those  which  are  of  importance  in  a  cockpit 
application.  The  specifications  are  by  no  means  exhaustive  and 
may  not  do  justice  to  some  of  the  products  that  have  been 
developed  for  applications  other  than  those  involving  cockpit 
integration . 

An  issue  which  needs  clarification  prior  to  reading  this 
assessment  is  the  configuration  of  the  various  speech  recognition 
products.  There  are  three  basic  types  of  configurations  into 
which  most  speech  products  fall.  First,  the  technology  may  be 
integrated  into  a  "development  system."  This  means  that  it  has 
been  factory  interfaced  with  a  computer  prior  to  its  purchase  by 
the  user.  "Development  systems"  typically  come  with  software  to 
aid  in  the  application  development.  Second,  there  are 
"standalone"  systems  that  communicate  with  the  host  processor 
chosen  by  the  user.  This  means  that  the  user  buys  a  board-level 
product  and  then  interfaces  it  to  his  or  her  own  host  computer 
Typically,-  this  type  of  system  requires  the  creation  of  a 
considerable  amount  of  software  on  the  part  of  the  user. 
Finally,  speech  recognition  products  may  be  in  the  form  of 
standard  or  custom  OEM  chips  to  accommodate  a  wide  range  of  form 
factor  and  interface  requirements. 

In  many  cases,  this  technology  assessment  details  only  one 
product  from  a  particular  manufacturer.  This  does  not  mean  that 
manufacturer  does  not  have  numerous  speech  products  available,  it 
simply  means  that  the  system  chosen  for  assessment  is  the  one 
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most  feasible  for  a  cockpit  application. 

Another  issue  that  must  be  clarified  before  continuing  with 
the  assessment  concerns  flightwcrthy  ASR  systems.  Several 
manufacturers  are  currently  working  on  these;  however,  it  must  be 
noted  that  these  systems  are  still  in  the  design  and  developement 
phase,  with  a  considerable  amount  of  work  still  needed  to  make 
their  use  feasible  in  the  flight  environment. 

First,  a  table  (Table  4)  will  be  presented  in  which  the 
pertinent  specifications  for  each  speech  recognition  device  are 
summarized.  This  will  be  followed  by  a  more  detailed  description 
of  each  of  the  assessed  devices. 
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AUTOMATIC  SPEECH  RECOfiMITIOfl  HARDWARE  SPECIFICATIONS 
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SEE  WRITTEN  DESCRIPTION  TOR  MORE  DETAIL 


Intel 

Intel  produces  what  they  call  a  speech  transaction  family  of 
speech  products.  The  speech  transaction  board  is  available  for 
$2,900.00  and  is  the  actual  speech  recognition  hardware.  'T’he 
speech  transaction  development  set  ($4,900.00)  is  the 
accompanying  operating  system  and  software  which  allows  the  user 
to  integrate  the  hardware  into  an  actual  application.  Intel  is 
currently  in  the  process  of  making  several  major  updates  to  their 
speech  transaction  family  of  products;  an  improved  recognition 
algorithm  which  will  provide  better  constant  discrimination  will 
be  implemented  for  the  speech  transaction  board.  In  addition, 
the  ability  to  maintain  several  templates  for  each  vocabulary 
word  may  also  be  implemented.  For  this  reason,  the  number  of 
training  passes  needed  to  use  this  device  is  undecided. 
Additional  noise  processing  will  be  added  with  an  algorithm  that 
will  measure  the  background  noise  between  words  and  subsequently 
subtract  this  noise  from  the  speech  signal.  The  impulse  noise 
filter  will  also  be  enhanced.  The  actual  levels  of  noise  to 
which  this  device  is  immune  are  as  yet  undetermined.  The  speech 
transaction  development  set  will  also  be  expanded  with  additional 
software . 

Although  Intel  does  not  specifically  offer  speech  output 
capabilities  with  <_his  system,  they  have  provided  the  means  for 
the  user  to  integrate  his  or  her  own  speech  synthesis  device  with 
this  system.  Intel  anticipates  a  full  release  of  these  expanded 
capabilities  in  November,  1983. 
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Interstate 


Interstate  offers  a  wide  variety  of  speech  recognition 
equipment,  ranging  from  chips  to  fully  integrated  voice 
recognition  terminals.  Many  of  their  product®1  arc  designed  to 
operate  with  spec  f ■ c  host  computers  such  as  Lear  Siegler 
Incorporated  and  Digital  Equipment  Corporation  computers. 
Interstate  also  offers  several  types  of  speaker  -  independent 
speech  recognition  chips. 

SYS  3C0 

The  SYS  300  is  a  board-level,  speaker-dependent  speech 
recognition  system  designed  specifically  to  be  interfaced  to  most 
RS  232C  terminals.  There  are  approximately  15  recognition 
commands  that  may  be  used  in  creating  application  software  for 
this  device.  The  device  is  capable  of  recognizing  up  to  100 
words.  Interstate  claims  that  the  SYS  300  is  resistant  to  noise 
levels  up  to  80  dB(A).  A  voice  output  module  (VTM  150)  may  be 
purchased  for  $995.00  and  interfaced  to  the  SYS  300.  The  VTM 
150  includes  a  fOO-word  fixed  vocabulary  and  1000  word  user- 
programmable  vocabulary  with  text-to-speech  capabilities. 

ITT 

ITT  has  developed  a  fiightworthy  speaker-dependent  isolated 
word  recognition  system  for  the  tactical  aircraft  cockpit 
environment.  This  system  was  designed  to  withstand  the  high 
"g"  levels,  high  noise  levels,  and  oxygen  mask  breath  noise 
inherent  in  the  tactical  aircraft  cockpit. 

To  a  large  extent,  this  device  is  still  in  the  developmental 
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stage . 


However,  preliminary  flight  testing  aboard  the  Air  Force 


Technology  Integrator  (AFTI)  F-16  indicated  that  the  ITT  system 
maintained  a  ’■ecognition  accuracy  rate  of  approximately  90%  in 
high  "g"  and  noise  levels  (5  "g"  and  115  dB ( A ) ,  respectively). 
ITT  is  wording  to  integrate  speecn  synthesis  capaoil- ties  in  tnis 
device  as  well  as  connected  word  recognition  capabilities. 

Lear  Siec ler  I ncorcorated 

Lear  Siegler  has  also  developed  a  f T ightworthv ,  tactical 
Voice  Controlled  Interactive  Device  (VCID)  for  military 
application  flight  testing.  This  system  was  designed  to  operate 
in  the  same  operating  environment  as  the  ITT  speech  recognition 
device.  Lear  Siegler  claims  that  this  system  can  be  trained  on 
the  ground  in  a  low  noise  environment  prior  to  use  in  flight. 
This  device  can  accomodate  a  maximum  vocabulary  size  of  256  words 
or  short  phrases.  A  speech  synthesis  unit  will  be  available  with 
the  VCID  for  operator  feedback. 

The  VCID  has  undergone  preliminary  flight  testing  aboard  the 
AFTI  F-16.  Results  indicated  that  in  noise  levels  up  to 
approximately  103  dB( A),  recognition  accuracy  rates  were  in  the 
80%  to  90%  range.  Beyond  103  dB ( A ) ,  however,  recognition 
accuracy  declined  abruptly.  During  later  portions  of  these 
flight  tests,  Lear  Siegler  added  a  Speech  Enhancement  Unit  ( SEU ) 
to  the  VCID  which  appeared  to  raise  these  recognition  accuracy 
rates  by  several  percentage  points.  The  SEU  is  basically  a 
front-er.d  processor  which  samples  the  background  noise  and 
subtracts  it  from  the  speech  signal. 

Lear  Siegler  is  currently  making  modifications  to  the  VCID 
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NEC 

NEC  has  two  speech  recognition  system  on  the  market:  the  SR 
100  and  the  DP  200. 

DP  200 

For  $15000.00  NEC  offers  a  speech  recognition  system  that 
will  provide  the  user  with  up  to  ■ 20  s  of  connected  word 
recognition.  A  maximum  vocabulary  of  150  words  can  be  used  in 
the  connected  mode,  and  a  maximum  of  500  words  can  be  recognized 
in  the  discrete  mode.  The  DP  200  comes  with  two  floppy  disc 
drives,  an  operating  system,  and  various  software.  Template 
handling  can  be  dene  either  internally  in  the  DP  200  or  through 
the  host  computer.  The  benefit  associated  with  allowing  the 
templates  to  be  handled  by  the  DP  200  is  that  it  frees  the  host 
from  continually  having  to  monitor  the  interface  line.  This 
internal  control  process  essentially  preprocesses  and  buffers  the 
incoming  speech  information  before  sending  it  to  the  host.  The 
DP  200  requires  minimal  training  and  provides  a  retrain 
capability  for  select  parts  of  the  vocabulary.  NEC  claims  that 
the  DP  200  will  withstand  up  to  85  dB ( A )  of  random  noise. 

Speech  output  capabilities  may  be  added  to  this  system  for 
an  additional  $4,600.00.  This  aud. o  response  unit  uses  a 
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digitization  technique  and  will  provide  the  iser  with  either  90  s 
of  speech  at  16  kb  or  60  s  of  speech  at  32  kb. 

SR  100 

The  SR  itO  is  the  only  low  cost  ($2000.00)  speech 
recooniticn  prcduct  on  the  market  that  has  connected  word 
recognition  capabilities.  This  high  speech  option  or  Quiktalk 
mode  allows  a  maximum  string  of  10  words  to  be  recognized  in  a 
connected  fashion.  Two  training  passes  are  required  for  these  10 
words.  In  both  the  discrete  and  Quiktalk  mode,  the  SR  100 
maintains  each  template  separately  in  memory. 

To  interface  the  SR  100  to  a  host  computer,  there  are  seven 
user  definable  parameters.  For  an  additional  $2000.00  a  voice 
output  device  (AR  100)  can  be  purchased  to  work  with  the  SR  ICO. 
The  AR  100  provides  120  s  of  digitized  speech. 

Altnougi'  the  SR  i 00  was  not  designed  tor  use  in  the  aircraft 
environment,  United  Technologies  conducted  ar.  in-house  test  cf 
the  SR  100  in  three  noise  conditions  using  two  different  types 
of  microphones.  The  noise  levels  tested  were  20  dB  ambient 
noise,  85  d3  S-76  cockpit  noise,  and  100  dB  UH-60  cockpit  noise. 
The  tests  were  conducted  using  both  a  threat  microphone  and  a 
Shure  noise  cancelling  microphone.  For  the  digit  vocabulary 
using  the  throat  microphone,  the  SR  100  achieved  a  recognition 
accuracy  rate  of  96s  across  all  three  noise  conditions.  Using 
the  Shure  noise  cancelling  microphone,  an  accuracy  rate  of 
appoximately  97%  on  the  digit  vocabulary  was  obtained  across  all 
three  noise  conditions. 


Scott  Instruments 
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Scott  Instruments  offers  a  low  cost  ($795.00)  speaker 
dependant  speech  recognition  system  .  The  Voice  Entry  Terminal 
(VET)  includes  a  terminal,  a  microphone,  a  microcomputer 
interface,  user's  manual,  and  system  software.  The  VET  is 
designed  specifically  to  work  with  Apple  Computers.  The  noise 
immunity  of  this  system  is  unspecified. 

Vertex 

The  Vertex  1800  is  a  high  cost  ($80,000.00)  speaker  - 
independent,  connected  word  recognizer.  This  device  was  designed 
specifically  to  allow  a  user  to  communicate  with  a  computer  or  a 
telephone  switching  system  by  talking  to  it  over  any  telephone. 
The  system  can  accomodate  up  to  eight  users  simultaneously. 
Speech  output  (digitized  speech)  is  an  option  with  the  Vertex 
1800.  The  minimum  recognition  vocabulary  consists  cf  10  digits 
(0-9)  and  "yes"  and  "no".  This  vocabulary  may  be  expanded  up  to 
50  words.  The  speech  output  vocabulary  includes  up  to  32  words 

or  16  s  of  speech  and  can  be  expanded  up  to  512  words  or  256  s 
of  speech. 

Votan 

Votan  offers  the  following  types  of  speech  technolgy: 
speaker  -  dependent  and  independent  recognition,  speech  output, 
voice  store  and  forward,  vocoding,  and  speaker  verification. 
Various  combinations  of  these  features  are  available  either  in 
system,  standalcne/ or  board  form. 

V5000 
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The  V50C0  cc~bir.es  speaker  -  dependent  word  recognition, 
speech  output,  and  voice  store  and  forward  capabilities  in  a 
standalone  unit.  Votar.'s  speech  recognition  technology  requires 
one  or  two  training  passes  and  the  resultant  templates  are  stored 
seiarateiy  in  memory.  The  recognition  response  time  for  the 
V5CG0  is  ISC  ms  plus  an  additional  2  ms  for  each  word  in  the 
vocabulary.  It  must  be  noted  that  if  syntax  structures  are  used, 
the  response  time  would  be  130  ms  plus  2  ms  for  each  word  in  the 
syntax  node  (as  opposed  to  2  ns  for  each  word  in  the  entire 
vocabulary ) . 

The  speech  output  available  from  Votar.  is  digitized  and  is 
user  programmable .  The  user  has  a  choice  of  three  bit  rates  fcr 
the  speech  digitization. 

The  voice  store  and  forward  technology  allows  speech  to  be 
digitized,  compressed,  and  stored  in  RAM  memory.  The  speech  can 
then  be  transferred  to  a  host  processor  or  a  mass  storage  device. 
This  information  may  be  retrieved  in  audio  form  by  reconverting 
the  digital  data  back  to  an  analog  signal.  , 

The  noise  immunity  of  the  V5000  was  recently  tested  at  NASA- 
Anes  Research  Center  (Coler,  1932).  For  the  purposes  of  this 
test,  the  V5000  was  trained  on  the  digit  vocabulary  (0-9)  in 
quiet  an-  recognition  was  attempted  both  in  quiet  and  in  ICO 
dB(A)  nc.se.  The  ”5000  was  also  trained  in  100  dB(A)  noise  and 
recognition  was  attempted  again  in  both  quiet  and  100  dE ( A ) 
helicopter  noise.  Results  indicated  that  from  a  grand  total  of 
3,200  utterances  (collected  from  eight  subjects)  only  one  miss  or 
substitution  error  occurred  and  there  were  no  rejections. 

Votar.  is  currently  working  on  making  continuous  word 
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PILOT  QUESTIONNAIRE 


THE  ROLE  OF  SPEECH  INPUT  AND  OUTPUT  IN  THE  HELICOPTER  COCKPIT 

At  NASA-Ames  Research  Center,  we  are  currently  examining  the 
potential  uses  for  voice  warning  and  control  systems  in  future 
helicopter  cockpits.  As  you  know,  current  rotorcraft  operations 
require  manual  input  (in  the  form  of  switch  manipulations,  flight 
control,  etc.)  to  the  various  on  board  systems  and  provide  visual 
and  auditory  output  to  the  pilot  in  the  form  of  flight  instrument 
displays  and  alerting  signals  (horns,  buzzers,  etc.).  V.’e  have 
acknowledged  that  the  visual  and  manual  demands  placed  on  the 
helicopter  pilot  are  at  times  excessive.  Our  work  on  speech 
technology  in  the  cockpit  is  aimed  at  reducing  or  offloading 
these  demands  as  well  as  increasing  the  utility  of  the  aircraft. 

An  avionics  system  into  which  speech  technology  is 
integrated  would  involve  "speaking"  to  an  on  board  computer 
commanding  it  to  perform  switch  sequences,  requesting  information 
from  the  various  aircraft  systems,  etc.  The  system  would 
recognize  your  voiced  command,  perform  the  requested  task (  and 
report  back  verbally,  if  requested,  that  the  task  has  been 
completed.  In  addition  the  system  could  give  you  warning  and 
advisory  information  verbally  rather  than  visually. 

This  questionnaire  is  divided  into  two  sections.  In  the 
first  section,  we  have  listed  some  tasks  that  might  be  performed 
by  voice  in  an  existing  helicopter,  the  AH-i.  Because  you  have 
had  experience  flying  this  helicoper,  we  would  like  you  to 
evaluate  each  of  these  tasks  with  respect  to  the  potential 
desirability  of  having  speech  perform  these  tasks.  When  you 
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respond  to  these  questions  assume  that  a  computer  has  been  added 
to  the  aircraft  and  that  you  have  the  ability  to  control  on  board 
systems  by  voice  and  receive  various  types  of  information 
verbally  from  this  system.  Our  goal  in  this  part  of  the 
questionnaire  is  to  determine  what  types  of  tasks  you  think  will 
be  best  suited  for  voice  technology. 

The  second  section  of  the  questionnaire  will  give  you  the 
opportunity  to  think  about  the  design  of  future  rotorc’-aft  and  to 
tell  us  what  you  would  like  if  virtually  any  cockpit  design 
becomes  possible. 
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QUESTIONNAIRE 

The  following  questions  are  to  provide  you  with  an 
opportunity  to  contribute  your  ideas  and  opinions  about  hew 
speech  technology  might  be  implemented  in  the  AH-1.  Your  ideas 
are  valuable  and  important  since  the  information  obtained  from 
this  questionnaire  will  provide  guidelines  for  the  implementation 
of  this  technology  in  future  rotorcraft. 

The  personal  data  sheet  is  for  the  purpose  of  data  analysis 
only.  No  comments  or  answers  will  be  associated  with  your  name. 

Please  answer  each  question  carefully.  The  more  comments  and 
examples  you  have  with  respect  to  these  tasks  the  better  (please 
write  them  on  the  nack  of  the  page). 

The  five  point  scale  provided  after  each  question  provides  a 
continuum  of  desirability,  from  extremely  undesirable  to 
extremely  desirable.  Please  indicate  your  oDinion  by  circling 
the  number  which  best  describes  your  opinion. 

EXAMPLE 

12  3  4  5 

Extremely  Somewhat  Not  Somewhat  Extremely 

Undesirable  Undesirable  Sure  Desirable  Desirable 


PERSONAL  DATA 


Name /Rank _ _ 

Organization _  Position _ 

Age _  Date _ 

PILOT  EXPERIENCE.  Please  approximate  hours  by  type 

Rotorcraft  type  Hours  Total 


Do  you  or  have  you  flown  fixed  wing  aircraft? 


Aircraft  type 


No 


Hours  Total 


Do  you  play  video  games? 

Often _  Occasionally _  Never 

Do  you  own  a  home  compute’"? 

Yes _  No _ 

Have  you  taken  any  computer-  programming  courses? 
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Yes _  No _ _ 

Have  you  ever  written  a  com outer  program? 

Yes _  No _ 

Have  you  ever  heard  computer  generated  speech? 

Yes _  No _ 

If  yes,  please  explain. 


Have  you  ever  used  an  automatic  speech  recognition  device? 

Yes _  No _ 

If  yes,  please  explain 


Please  provide  other  comments  on  attitudes,  education  or 
experience  that  might  influence  your  answers  to  this 
questionnaire . 
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PILOT  QUESTIONNAIRE 


Computer  generated  speech  could  be  used  to  advise  you  when 
certain  parameters  or  systems  move  outside  safe  operating  limits 
or  become  inoperative.  A  number  of  these  are  listed  below.  For 
each  one  rate  hew  desirable  it  would  be  to  have  an  advisory  or 
warning  about  it  presented  by  voice. 

1.  ENGINE  OIL  TEMPERATURE 


1 

2 

3 

5 

Extremely 

Somewhat 

Not 

Somewhat 

Extremely 

Undesirable 

Undesirable 

Sure 

Desirable 

Desirable 

ROTOR  RFM 

1 

2 

3 

4 

5 

Extremely 

Somewhat 

Not 

Somewhat 

Extremely 

Undesirable 

Undesirable 

Sure 

Desirable 

Desirable 

ENGINE  RPM 

1 

2 

3 

4 

5 

Extremely 

Somewhat 

Not 

Somewhat 

Extremely 

Undesirable 

Undesirable 

Sure 

Desirable 

Desirable 

TORQUE  PRESSURE 

1 

2 

3 

4 

5 

Extreme ly 

Somewhat 

Not 

Somewhat 

Extr imely 

Undesirable 

Undesirable 

Sure 

Desirable 

Desirable 
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5. 


TGT  (Turbine  Gas  Temperature) 


1 

2 

3 

4 

5 

Extreme ly 
Undesirable 

Somewhat 

Undes i r able 

Not 

Sure 

Somewhat 

Desirable 

Extremely 

Desirable 

6.  ENGINE  CIL 

PRESSURE 

1 

2 

3 

4 

5 

Extremely 

Undesiraole 

Somewhat 

Undesirable 

Not 

Sure 

Somewhat 

Desirable 

Extremely 

Desirable 

7.  ENGINE  CIL 

BYPASS 

1 

2 

3 

1 

5 

Extreme  1 y 
Undesirable 

Somewhat 

Undesirable 

Not 

Sure 

Somewhat 

Desirable 

Ex  t  r erne  1 y 
Desirable 

3.  FV.'D  or  AFT 

FUEL  BOOST 

1 

2 

3 

4 

5 

Extremely 
Ur.desi  -able 

Somewhat 

Undesirable 

Not 

Sure 

Somewhat 

Desirable 

Extremely 

Desirable 

9.  ENG  FUEL  PUMP 

1 

2 

3 

4 

5 

Extremely 

Undesirable 

Somewhat 

Undesirable 

Not 

Sure 

Somewhat 

Desirable 

Extremely 

Desirable 
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10.  10 i  FUEL  REMAINING 


12  3-55 

Extremely  Sc'^what  Net  Somewhat  Extremely 

Undesirable  Undesirable  Sure  Desirable  Desirable 

11.  FUEL  FILTER 


12  3  4 


Extremely  Somewhat  Not  Somewhat  Extremely 

Undesirable  Undesirable  Sure  Desirable  Desirable 


12.  XMSN  OIL  BYPASS 


3  4 


Extremely  Somewhat  Not  Somewhat  Extremely 

Undesirable  Undesirable  Sure  Desirable  Desiraole 

13.  XSMN  OIL  PRESS 


1  2  3  4  5 


Extremely  Somewhat  Not  Somewhat  Extremely 

Undesirable  Undesirable  Sure  Desirable  Desirable 


14.  XSMN  OIL  HOT 


1  2  3  4  5 


Extremely  Somewhat  Not  Somewhat  Extremely 

Undesirable  Undesirable  Sure  Desirable  Desirable 
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HYD  PRESS  '1  or  =2 


1 

2 

9 

4 

Extremely 

Undesirable 

Somewhat 

Ur.cesiraole 

Net 

Sure 

Somewhat 

Desirable 

Extremely 

Desirable 

16.  inst  inv 

ERIE? 

2 

3 

4 

5 

Extrer.oly 
L'ncieji.'jl 1  e 

Somewhat 

Undesirable 

Net 

Sure 

Screw ha t 
Desirable 

Ext  rem.e  ly 
Desirable 

17.  DC  GENS?. 

A  TCP 

1 

i. 

3 

* * 

5 

Ext  rer.e  ly 
Undesirable 

Seme  w n a  t 

Ur.ces  i  rable 

Net 

Sure 

Scrr.ewhat 
Des i rable 

Extremely 

Desirable 

18.  CHIP  DEC 

ECTCR 

± 

2 

-! 

5 

Ext  rerely 
U.-.desirablo 

Somewhat 

Undesirable 

Not 

Sure 

Somowna t 
Desirable 

Extremely 
Lesi rable 

19.  IFF 

1 

2 

3 

4 

5 

Extremely 

Undesirable 

Seme  vhat 

Undes i rable 

Not 

Sure 

Somewhat 

Desirable 

Extremely 

Desirable 

Given  the  installation  of  a 
aircraft,  advisory  and/or  war 
to  you  by  voice.  Items  20-23 
For  each  one  please  indicate 
this  information  presented  to 

variety  of  sensors  throughout  the 
mng  information  could  be  presented 
deal  with  this  type  of  information, 
how  desirable  it  would  be  to  have 
you  by  voice. 

9 


r.o 


rtd  V  «  Sr 


i  eiir.i 


that:  the  helicopter 

when  it  is  being  park* 


ceen  grcunced  during 


Ext  r  erne  -  y 
Undesirab] 


Somewhat 
Ur.desr  rabl- 


Not 

Sure 


Somewhat  Extrer'.ely 
Desirable  Desirable 


21.  Advise  if  ground  safety  pins  have  not  bean  installed  in  the 
pil-Ow  and/ cr  gunner  canopy  removal  arnur.g/f ir mg  mechanisms  -..hen 
the  helicopter  is  to  be  narked. 


uxtreT.e  iy 
Ur.des  -  rab  J 


Not 

Sure 


Somewhat  Extremely 
Desirable  Desirable 


22.  Warn  if  carbon  monoxide,  smoke  etc.  is  detected  in  the 
cockpit. 


X 

2 

3 

4 

5 

Extremely 

Somewhat 

Net 

Somewhat 

Ext 

ro;r.e  ^  w 

Undesirable 

Undesirable 

Sure 

Desirable 

Des 

liable 

23.  Advise 

installed  when 

if  stores  jetti 
helicopter  is  on 

son  safety 
the  ground. 

pins  have 

not 

been 

1 

2 

3 

4 

5 

Extremely 

Somewhat 

Nor 

Somewhat 

Ext 

remely 

Undesirable 

Undesirable 

Sure 

Desirable 

Des 

l  rade 

24.  How  desirable  would  it  be  to  have  a  voice  generator  assist  in 
performing  checklist  items? 


1 

2 

3 

4 

5 

Extremely 

Somewhat 

Not 

Somewhat 

Extremely 

Undesirabl e 

Undesirable 

Sure 

Desirable 

Desirable 

10 


to 


■5.  Would  it  be  desirabl. 
:resented  to  you  verbally. 


i-ave  exact  tr.reat  inrcrmac ion  details 
" SA10 ,  4  O'clock,  launch?" 


Extreme  ly 
Ur.desi  rab 


Scnewnat 
Uno.es  i  table 


Scmeu  hat 
Desirable 


2  G .  Assure  for  the  moment  that  your  aircraft  is  data-1 

the  ground,  would  you  find  it  desirable  to  be  able 
targetting  information  ana  receive  it  verbally? 


H  x  i  r  ^  rr.  ^ 
L'ndesi 


Sc.T.ewr.at 

Undesirable 


Somewhat 

Desirable 


27.  Assure  that  the  entire  aircraft  manual  is  stored 
aircraft  computer’s  memory  and  is  accessible  to  yo 
flight.  Would  it  be  desirable  to  request  infcrr.atic 
manual  bv  voice  command  and  receive  it  verbally? 


Extremely  Somewhat 

Undesirable  Undesirable 


Somewha t 
Desirable 


22.  Would  you  like  to  be  reminded  when  certain  tasxs  s 
done,  for  example,  "change  IFF". 


Extremely  Somewhat 

Undesirable  Undesirable 


Somewhat 

Desirable 


29.  Can  you  think  of  any  other  type  of  information 
like  to  receive  from  a  voice  generator? 


/cu 


would 


to  to  :r  ro  n  :j  r.  to  to  rj  to 


31.  Pleas-?  rank  the  following  tnree  ways  in  which  warning 
information  could  be  presented  to  you  by  voice.  A  rank  order  of 
one  (1/  .veins  nos:  desirable  and  three  (1  means  least  desirable. 
Please  use  each  ranking  only  or.co . 

_ The  voice  generator  would  say  something  like  "Caution", 

which  would  then  alert  you  to  lock  to  the  instrument 

panel  for  a  problem. 

_ The  voice  generator  could  tell  you  exactly  what  is  out 

or  tolerance,  eg.  "'darning,  oil  pressure  low". 

_ The  voice  generator  could  tell  you  exactly  what  is  out 

of  tolerance,  by  how  much,  and  a  recommended  course  of 
action. 

Pome  ether  retried.  Please  elaborate-. 


32.  Ir  a  voice  generator  is  used  as  an  aid  in  performing 
checklist  items,  please  rank  the  following  ways  in  which  it 

could  te  implemented.  A  rank  order  of  one  (1)  means  most 
desirable  and  four  (-5)  means  least  desirable.  Please  use  each 
ranking  only  once. 


_ The  voice  generator  could  call  cut  each 

checklist  for  you  to  perform. 


item  in  the 


_ You  could  run  through  the  checklist,  following  which  the 

voice  system  could  remind  you  of  any  items  that  may  have  been 
overlooked  or  for  any  conditions  which  might  preclude  safe 
op orations . 

_ A' 1  checklist  items  could  be  placed  under  computer 

control  and  performed  automatically  for  you.  The  voice  generator 
would  then  advise  you  when  the  checklist  had  been  ccmoleted  or  if  * 
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any  irregularities  ’.sere  encountered. 


1 1  err  i n 


le  voice 
c.neck  list 


generation  system  could  call  ci 
when  you  request  it  to. 


the 


i  j  .  me 
function: 


following  items  comprise  the  general  categories 
for  wn ich  computer  generated  speech  might  be  u 
Please  rank  these  categories  from  one  to  four,  with  one 
meaning  the  most  desirable  for  computer  generated  speech  and 
(1)  meaning  the  least  desirable.  Please  use  each  ranking 
once  . 


?  r 


mvisor  y 


information  ea .  "oil  Dressurc 


ati 

Lew"  . 


cautionary 


_Preser.taticn  c:  c'rx-ral  information  that  has 
requested  by  the  pilot  eg.  "EG?  670  degrees". 

Presentation  cf  feedback  or  acknowledgment  that  tas 
have  beer,  completed,  for  example,  "Outboard  stc 
selected" . 


.  :  toll 

lew"  . 


of  emerging  information  eg .  "rotor  P.P 


next 


of 
sed . 

( 1  ) 
four 
or.  1  y 


type 

been 


res 
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wciM  :  ■*  *  : 
your  vc :  c-.~. 


co-uld  ..  ;  urea  to  licC',Ej  ;  ni  ormu  t  ten  1  ret:  vor : cun 
ru-:.,-nns.  by  way ; na ,  ter  -.•xan.pl  e,  "Kcquost  altitude"  , 

’  -.eel,:  ic!-;  with  ”12o  !  t "  .  i-'er  each  or  the 

.  :  x  tyj.  or  jr.lt  r  :ra  t  J  on.  p  1  •  .-a  uo  rate  hew  <.2es  1  raLI  ••  rt 

r.  this  ir.r  errut  rcr.  with  a  reckon  cerrr.me  .  when 

re  t:  .t-.ru,  .jusurr.e  hat  the  machine  will  recce;.-,  i  ;:<•* 

:  ccrrr.ar.d  w  r  t  n  the  accuiacy  cr  a  human  listener. 


Ur. 


i , 


..or 

Si.r-. 


Scr.ewha  t 
C e i  rath. 


Ext  r  1  y 


Ur.-. 


v.  , 


net 


S  or  a  t 
C  r  r: :  r  a  L  1  • 


i 1  ■ 


x  l. 


Extreme !y 
linden  i  r  a  b : 


••  -vtUt 
l  r.ji,  le- 


i»G  t 
jUI 


Sorrow;, t 
Dos  Irani- 


i  7 .  CAT  ; 


Arr  Terr- -r-iturc  ) 


o:<  t  r  1  y 
Ur.' Jer;  i  rah 


aon.owa.i  t 
■.com  rani-.- 


i'Ot 

Sure 


Sen.'  -wn.it 
Do  g  Irani' 


Ex 

Do 


-.- 1  v 
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HEADING 


38  . 


1 

2 

3 

4 

1 

b 

Extremely 

Somewhat 

Not 

Somewhat 

Extremely 

Undesirable 

Undesirable 

Sure 

Desiracle 

Desirable 

39.  TIME 

1 

2 

3 

4 

5 

Extremely 

Somewhat 

Not 

Scmewhat 

Extremely 

Undesirable 

Undesirable 

Sure 

Desirable 

Desirable 

40.  TORQUE 

1 

2 

3 

4 

5 

Extremely 

Somewhat 

Net 

Scmewhat 

Extremely 

Undesirable 

Undesirable 

Sure 

Desirable 

Desirable 

4 i .  ROTOR  R  ?  M 

1 

2 

3 

*1 

C 

-> 

Extremely 

Somewhat 

Not 

Somewhat 

Extremely 

Undesirable 

Undos i rable 

Sure 

Desirable 

Desirable 

•12.  How  desirable  would  it  be  to  turn  cockpit  lighting  on  and  off 
by  voice  command? 


1 


2 


3 


4 


5 


Extremely  Somewhat  Not 

Undesirable  Undesirable  Sure 


Somewhat  Extremely 
Desirable  Desirable 
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43.  Voice  ccrr..' r.d  could  he  used  to  reset  circuit  breakers, 
desirable  would  it  be  to  perfcrr  this  task  by  voice? 


Hew 


Extremely  Somewhat  Not 

Undesirable  Undesirable  Sure 


Somewhat  Extremely 
Desirable  Desirable 


44.  Hew  desirable  would  it  be  to  tune  the  radios  by  voice? 


3 


5 


Extremely  Somewhat  Not 

Undes l rac  le  Undesirable  Sure 


Somewhat  Extremely 
Desirable  Desirable 


45.  If  you  had  to  use  voice  command  to  tune  radios,  would  you 
rather  tune  the  radio  by  frequency  (eg.  "Tune  256.4")  or  by  name 
of  the  station  (eg.  "Tune  Moffett  Tower").  Please  circle  your 
prefer  er.ee . 

FREQUENCY  NAME 


46.  Would  you  find  it  desirable  to  configure  the  voice  security 
equipment  by  voice,  using  one  command  to  accomplish  all  the 
tasks.  For  example,  "Set  plain  mode." 


2 


3 


5 


Extremely  Somewhat  Not 

Undesirable  Undesirable  Sure 


Somewhat  Extremely 
Desirable  Desirable 
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; 


47.  Similarly,  would  you  like  :o  configure  the  ADF  by  saying 
"Tune  Evansville  NDB,  loop  mode" 


The  computer  would  then  perform  the  following  tasks  for  you: 

A)  Tunes  Evansville  ADF 

B)  Identifies  the  station 

C)  Indicates  whether  you  are  in  receiving  a  reliable  signal 
frcm  the  station 


3 


Extremely  Somewhat  Not 

Undesirable  Undesirable  Sure 


Somewhat  Extremely 
Desirable  Desiraoie 


43.  A  voice  command  could  be  used  to 
burn-cut  times  during  a  mission.  •  Ho 
perrorm  this  task  by  voice? 

request  fuel  requ 
w  desirable  would 

ired  and 
it  be  to 

1 

2 

1 

w 

4 

= 

Extremely 

Undesirable 

Somewhat 

Undesirable 

Not 

Sure 

Somewhat 

Cos i rable 

Extremely 

Desiraole 

49.  A  voice  cc 
you  wish  to  use 
turret".  '..'cuiJ 
and  cutout. 

mmand  could 
m  the  AH- 1 
this  be  a  d 

be  used  to  select  the  type  o 
,  by  saying,  for  example, 
esirable  candidate  task  for 

f  weaDon 
"Select 
speech  input 

1 

2 

3 

4 

5 

Extremely 

Undesirable 

Somewhat 

Undesirable 

Not 

Sure 

Somewha  t 
Desirable 

Extremely 

Desirable 

50.  Voice  comm 
station  you  wish 
with  speech? 

and  could  be  used  to  select  the  particular  weapon 
to  use.  Would  it  be  desirable  to  perform  this  task 

1 

2 

3 

4 

5 

Extremely 

Undesirable 

Somewhat 

Undesirable 

Not 

Sure 

Somewhat 

Desirable 

Extremely 

Desirable 

17 


ol .  The  r.urtor  of  weapons  to  Le  fired  could  be  specified  by 
voice  command.  V.'oulc  this  be  a  desirable  task  for  speech? 


1 

2 

3 

1 

d 

5 

Extremely 

Somewhat 

:bt 

Somewhat 

Extreme ly 

Unciesi  table 

Uncles  i  rable 

Sure 

Desirable 

Desirable 

5  2  .  Th  e  fin 

r.g  sequence  of 

t 

hese  weapons 

could  also  fc 

o  s  p  o  c  i  f  i  e  d 

by  voice  cor 

manci.  'Would 

i 

t  be  desirable  to  perform. 

this  task 

voice ? 

1 

~> 

3 

4 

5 

Extreme  1 y 

Somewhat 

hot 

Somowh a t 

Extremely 

Ur.dosi  r able 

Undesirable 

Sure 

Dos l rable 

Don  ramie 

53.  Voice 

command  could 

bo 

used  to  i  ■ 

a  weapons. 

Would  it  be 

desirable  to 

have  this  capa 

bi 

Uty? 

- 

2 

3 

4 

5 

Extreme! y 

Scmewh,  at 

Not 

Somewhat 

Ext  r erne ly 

U r. den  i  noli; 

Ur.dos  l  rable 

Sure 

Des  l  rabl 

Des l rable 

5  3.  »*oii  It 

it  be  desirab 

1  e 

to  jet t. son 

stores  when 

u  e  c  ■ ;  s  s  a  r  y 

voice  command 

> 

1 

2 

3 

5 

Extremely 

Somewhat 

Not 

Scmewha t 

Ext  re-mo  1  y 

b'  n  d  e  s  l  r  a  b  1  e 

Undesirable 

Sure 

Desirable 

D'-s  i.  rable 

55.  If  your  aircraft  was 

equipped  with 

an  automatic 

hover  hold 

bob-up  mode. 

how  desirable 

would  it  be 

for  you  to 

control  thi 

modes  by  voice  command? 

1 

2 

3 

4 

5 

Extremely 

Scmewha t 

Not 

Somewhat 

Ex  t  r eme 1 y 

O'ndes  l rable 

Undesirable 

Sure 

Desirable 

Des i rable 
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56.  The  following  iters  comprise  sore  of  the  general  categories 
of  tasks  for  which  voice  command  rr.ignt  be  used  in  helico&ter 
operations.  Please  rar.k  the  desirability  of  performing  these 
types  of  tasks  by  spoken  c errand  iron  cr.e  (1)  to  seven  (7),  with 
cr.o  meaning  the  most  desirable  and  seven  meaning  the  least 
desirable.  Please  read  all  areas  before  ranking  them,  and  use 
each  ranking  only  once. 


Vehicle  control,  for  example,  "Eob-up” 

’..'capon  stores  management,  for  example,  "select 


a tic  a  t  a s a z , 


example,  "Tone  Evansville  VC?." 


Common  i  ca  1 1  ens  ,  eg.  "Tone  Moffett  Tower" 


bUDsvstvr.i 


management,  for  example,  "HUD  or." 


_ Weapon,  delivery,  eg.  "T.aur.ch  TCW" 

_ Request  mg  flight  instrument  information  by  voice 

command  and  receiving  that  information  from  a  speech 
system,  eg.  "Torque"  -  "b«  Percent". 

57.  If  you  had  the  ability  to  use  voice  command  in  the  cockpit, 
tr.ere  are  several  ways  m  which  you  could  activate  the  system 
lie.  let  it  knew  that  your  are  talking  to  it).  Please  rank  the 
desirability  of  the  following  activation  methods  from  one  II'  to 
three  (3)  with  one  meaning  the  most  desirable  and  three  meaning 
the  least  desirable. 

_ Push-to-ta lk  swi ten 

__ _ Have-  the  voice  system  actively  listening  for  your 

s p o k e n  comma nd  all  the  1 1 me . 

_ Hay  a  keyword  which  would  activate  the  system  prior  to 

.^peaking  the  actual  command. 

Some  ether  method.  Please  elaborate. 


53.  Comments:  Please  comment  on  the  use  of  voice  command  for 
tasks  in  the  above  categories.  Give  us  examples  of  any  other 
categories  of  tasks  for  which  voice  command  might  be  used  in  the 

AH  -  1  . 
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Section  2 

In  the  following  section  we  would  like  ycu  to  think 
seriously  about  hew  ycu  would  design  a  future  pi lot/3 i rcra f t 
interface  (cockpit)  m  an  effort  to  make  your  jcb  easier  and 
safer.  Don't  worry  a  c  c  u  t  whether  your  i  d  e  a  s  are  t  echr.o  1  om  ca  1 1  y 
feasible--treat  them  a s  if  a n y t h l r.a  l s  possi ble .  You  will  be 
given  six  areas  tc  respond  to;  answer  them  iron  a  scout/attack 
type  of  mission  standpoint.  Within  each  area,  tell  us  what 
cockpit  changes  you  would  like  to  see  i  r.  current  rotorcraft,  what 
you  would  like  m  a  fu:  -otorcraft,  and  hew  you  would  like  to 

have  it  done.  If  you  oiscuss  a  design  change  in  an  existing 
rotorcraft,  be  sure  to  specify  which  one.  We  also  want  to  knew 
hew  you  would  like  to  interact  with  your  helicopter  in  each  of 
these  areas.  In  other  words,  for  each  change  or  . dea  you  have, 
tell  us  whether  you  would  like  to  use  speech  input  and/or  output, 
visual/  manual  input  and  output,  seme  combination  thereof,  or 
something  completely  different.  Sketches,  if  applicable,  might 
help  us  understand  your  ideas  better. 

Things  t c  remember  w  1  -,en  comp  1  et l r.g  this  sect  i  on 

1 .  Be  specific 

2.  Don't  worry  about  writing  style,  etc.  (just  be  legible) 

3.  Disregard  current  technological  constraints 

4.  Sketch  your  ideas  on  the  back  of  each  page  if  you  like. 
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